Amazon EMR allocates conatainer with 1 core on slaves - hadoop-yarn

I have the strange behaviour that yarn cannot allocate all containers properly.
I have maximizeResourceAllocation to true and it tries if I start an m4.4xlarge slave instance to request for each executor container all 32 cores from yarn.
However that fails for one container because the applicationmaster master process uses 1 core.
1 container 1vcore for the master process
32 cores container executor 1
32 cores container executor 2
32 cores container executor 3 fails because yarn has only 31 cores left to give.
my Step should execute in client mode. It is the same with the Zeppelin instances.
HadoopJarStepConfig sparkStepConf = new HadoopJarStepConfig()
.withJar("command-runner.jar")
.withArgs("spark-submit",
"--class", ".....",
"--deploy-mode", "client",
"/home/hadoop/jars/.....jar");
I use a bootstrap step to get the jars into /home/hadoop/jars, because if you want to use s3 paths only deploy cluster is allowed which would block one of my executors with the sparkContext process.
All this means if I only have one slave nothing happens at all. And if I have 3, 1 does not do any work. Which is a waste of money.
I could in theory calculate executor cores - 1 and force that setting in spark submit. But this is supposed to work.
How can I tell yarn to put this applicationMaster 1 core process on the master or not create it, or start executors with different core counts?
I only use 1-4 instances and effectively sitting idle is really ok.

Related

AWS ECS Fargate Memory Utilization vs Local Docker

We are using AWS Fargate ECS Tasks for our spring webflux java 11 microservice.We are using a FROM gcr.io/distroless/java:11 java image. When our application is dockerised locally and deployed as a image inside a docker container the memory utilization is quite efficient and we can see the heap usage never crosses 50%
However when we deploy the same image using the same dockerfile in AWS Fargate as a ECS task the AWS Dashbaord shows a completely different picture.The memory utilization never comes down and Cloudwatch logs show no OutOfMemory issues at all. In AWS ECS, once deployed we have done a Peak load test, a stress test after which the memory utilization reached 94% and then did a soak test for 6 hrs. The memory utilization was still 94% without any OOM errors.Memory the garbage collection is happening constantly and not letting the application go OOM.But it stays at 94%
For testing the application's memory utilization locally we are using Visual VM. We are also trying to connect to the remote ECS task in AWS Fargate using Amazon ECS Exec but that is work in progress
We have seen the same issue with other microservices in our and other clusters as well.Once it reaches a maximum number it never comes down.Kindly help if someone has faced the same issue earlier
Edit on 10/10/2022:
We connected to AWS Fargate ECS task using the Amazon ECS Exec and below were the findings
We analysed the GC logs of the AWS ECS Fargate Task and could see the messages.It uses the default GC i.e Simple GC. We keep getting "Pause Young Allocation Failure" which means that the memory assigned to the Young Generation is not enough and hence the GC fails.
[2022-10-09T13:33:45.401+0000][1120.447s][info][gc] GC(1417) Pause Full (Allocation Failure) 793M->196M(1093M) 410.170ms
[2022-10-09T13:33:45.403+0000][1120.449s][info][gc] GC(1416) Pause Young (Allocation Failure) 1052M->196M(1067M) 460.286ms
We made some code changes associated to byteArray getting copied in memory twice and the memory did come down but not by much
/app # ps -o pid,rss
PID RSS
1 1.4g
16 16m
30 27m
515 23m
524 688
1655 4
/app # ps -o pid,rss
PID RSS
1 1.4g
16 15m
30 27m
515 22m
524 688
1710 4
Even after a full gc like below the memory does not come down:
2022-10-09T13:39:13.460+0000][1448.505s][info][gc] GC(1961) Pause Full (Allocation Failure) 797M->243M(1097M) 502.836ms
One important observation was that after running inspect heap , a full gc got trigerred and even that didnt clear up the memory.It shows 679M->149M but the ps -o pid,rss command does not show the drop neither does the AWS Container Insights graph
2022-10-09T13:54:50.424+0000][2385.469s][info][gc] GC(1967) Pause Full (Heap Inspection Initiated GC) 679M->149M(1047M) 448.686ms
[2022-10-09T13:56:20.344+0000][2475.390s][info][gc] GC(1968) Pause Full (Heap Inspection Initiated GC) 181M->119M(999M) 448.699ms
How are you running it locally do you set any parameters (cpu/memory) for the container you launch? On Fargate there are multiple levels of resource configurations (size of the task and amount of resources you assign to the container - check out this blog for more details). Also the other thing to consider is that, with Fargate, you may land on an instance with >> capacity than the task size you configured. Fargate will create a cgroup that will box your container(s) to that size but some old programs (and java versions) are not cgroup-aware and they may assume the amount of memory you have is the memory available on the instance (that you don't see) and not the task size (and cgroup) that was configured.
I don't have an exact answer (and this did not fit into a comment) but this may be an area you can explore (being able to exec into the container should help - ECS exec is great for that).

YARN jobs getting stuck in ACCEPTED state despite memory available

Cluster goes into deadlock state and stops allocating containers even when GBs of RAM and Vcores are available.
This was happening only when we start a lot of jobs in parallel most of which were Oozie jobs with many forked actions.
After a lot of search and reading related questions and articles, we came across a property called maxAMShare for YARN job scheduler (we are using Fair Scheduler).
What it means?
Percentage of memory and vcores from user's queue share that can be allotted to Application Masters. Default value: 0.5 (50%). Source
How it caused the deadlock?
When we will start multiple oozie jobs in parallel, each oozie job and the forked actions require couple of ApplicationMaster containers to be allotted first for oozie launchers which then start the other containers to do the actual action task.
In our case, we were actually starting around 20-30 oozie jobs in parallel, each with close to 20 forked actions. And with each action requiring 2 ApplicationMasters, close to 800 containers were getting blocked only by the Oozie ApplicationMasters.
Due to this, we were hitting the 50% default maxAMShare limit for our user queue. And YARN was not allowing to create new ApplicationMasters to run the actual job.
Solution?
One instant suggestion could be to disable the check by setting this property to -1.0. But this is not recommended. You can again end up allocating all or most of the resources to AMs and the real job that will get done will be very less.
Other option (which we went ahead with) is to specify a separate queue for AMs in the oozie configuration and then set maxAMShare property to 1.0. This way you can control how much resources can be allocated to AMs without affecting the other jobs. Reference
<global>
<configuration>
<property>
<name>oozie.launcher.mapred.job.queue.name</name>
<value>root.users.oozie_am_queue</value>
</property>
</configuration>
</global>
Hope this will be a major time saver for people facing the same issue. There could be many others reasons for deadlock too which are already discussed in other questions on SO.

h2o instance shut down due to inactive in console

I am using H2O to develp model. After initiated H2O instance I got an IP and port for opening H2O flow in web browser. I used below command in HDFS to initiate the H2O instance. The problem is when I run hyperparameter search, the job takes multiple hours and my shell session got inactive and will automatically log me out. This will kill the console session and H2O instance will be killed as well. I am using Rstudio interface with H2O. Is there any way to keep H2O instance longer without auto log out/shut down due to inactivity in
start h20 cluster
hadoop jar /dsap/devl/h2o/h2o-3.10.4.1-hdp2.4/h2odriver.jar -nodes 30 -mapperXmx 8g -output /user/userid1/h2o1 -baseport 6335
Yes.
Add the -disown flag to do exactly that.

ClassNotFoundException while deploying new application version (with changed session object) in active Apache Ignite grid

We are currently integrating Apache Ignite in our application to share sessions in a cluster.
At this point we can successfully share sessions between two local tomcat instances, but there's one use case, which is not working so far.
When running the two local instances with the exact same code, it all works great. But when the Ignite logic is integrated in our production cluster, we'll encounter the following use case:
Node 1 and Node 2, runs version 1 of the application
At this point we'd like to deploy version 2 of the application
Tomcat is stopped at Node 1, version 2 is deployed, and at the end of the deployment Tomcat at Node 1 is started again.
We now have Node 1 with version 2 of the code and Node 2, still with version 1
Tomcat is stopped at Node 2, version 2 is deployed, and at the end of the deployment Tomcat at Node 2 is started again.
We now have Node 1 with version 2 of the code and Node 2, with version 2
Deployment is finished
When reproducing this use case locally with two tomcat instances in the same grid, the Ignite web session clustering fails. What I tested, was removing one 'String property' of a class (Profile) which resided in the users session. When starting Node 1 with this changed class, I get the following exception:
Caused by: java.lang.ClassNotFoundException:
Optimized stream class checksum mismatch
(is same version of marshalled class present on all nodes?) [expected=4981, actual=-27920, cls=class nl.package.profile.Profile]
This will be a common/regular use case for our deployments. My question is: how to handle this use cases? Are there ways in Ignite to resolve/workaround this kind of issue?
In my understanding your use case perfectly fits for Ignite Binary objects [1].
This feature allows to store objects in class-free format and to modify objects structure in runtime without full cluster restart when a version of an object is changed.
Your Person class should implement org.apache.ignite.binary.Binarylizable interface that will give you full control on serialization and deserialization logic. With this interface you can even have two nodes in the cluster that use different versions of Person class at both deserialization & serialization time by reading/writing only required fields from/to binary format.
[1] https://apacheignite.readme.io/docs/binary-marshaller

elasticsearch-mesos not getting listed under frameworks of mesosUI

Iam trying to run elasticsearch-mesos on mesos.My machine is running ubuntu 14.04. I have running mesos cluster installed with mesosphere packages by following these instructions. When I run test frameworks it gets lister under frameworks of mesosUI but for elasticsearch-mesos its not getting listed under mesos webUI. I want to run elasticsearch-mesos on top of mesos. I followed instructions given here. When I run ./elasticsearch-mesos I am getting a message in terminal
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
I tried running ./elasticsearch-mesos on both mesos masters and slaves.
The last few lines of terminal output is given below
2015-01-08 17:24:01,881:23844(0x7f175bfff700):ZOO_INFO#zookeeper_init#786: Initiating
client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7f1762a3e6a0
sessionId=0 sessionPasswd=<null> context=0x7f1710002530 flags=0
I0108 17:24:01.881392 23858 sched.cpp:137] Version: 0.21.1
2015-01-08 17:24:01,881:23844(0x7f172b7fe700):ZOO_INFO#check_events#1703: initiated
connection to server [127.0.0.1:2181]
2015-01-08 17:24:01,897:23844(0x7f172b7fe700):ZOO_INFO#check_events#1750: session
establishment complete on server [127.0.0.1:2181], sessionId=0x14ac7c469270006,
negotiated timeout=10000
I0108 17:24:01.898455 23861 group.cpp:313] Group process (group(1)#127.0.1.1:38668)
connected to ZooKeeper
I0108 17:24:01.898509 23861 group.cpp:790] Syncing group operations: queue size (joins,
cancels, datas) = (0, 0, 0)
I0108 17:24:01.898540 23861 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
According to the README at https://github.com/mesosphere/elasticsearch-mesos,
you may need to modify mesos.master.url to point to the same ZK url that the Mesos master is using (maybe not localhost). If you're using a single-master Mesos cluster, you can skip the ZK url and point this parameter directly to the Mesos master.
Please also note that the elasticsearch framework is a bit outdated, so use with caution