Dataproc set number of vcores per executor container - hadoop-yarn

I'm building a spark application which will run on Dataproc. I plan to use ephemeral clusters, and spin a new one up for each execution of the application. So I basically want my job to eat up as much of the cluster resources as possible, and I have a very good idea of the requirements.
I've been playing around with turning off dynamic allocation and setting up the executor instances and cores myself. Currently I'm using 6 instances and 30 cores a pop.
Perhaps it's more of a yarn question, but I'm finding the relationship between container vCores and my spark executor cores a bit confusing. In the YARN application manager UI I see that 7 containers are spawned (1 driver and 6 executors) and each of these use 1 vCore. Within spark however I see that the executors themselves are using the 30 cores I specified.
So I'm curious if the executors are trying to do 30 tasks in parallel on what is essentially a 1 core box. Or maybe the vCore displayed in the AM gui is erroneous?
If its the former, wondering what the best way is to set this application up so I end up with one executor per worker node, and all the CPUs are used.

The vCore displayed in the YARN GUI is erroneous; this is a not-well-documented but a known issue with the capacity-scheduler, which is Dataproc's default. Notably, with the default settings on Dataproc, YARN is only doing resource bin-packing based on memory rather than CPUs; the benefit is that this is more versatile for oversubscribing CPUs to varying degrees as desired per-workload, especially if something is IO bound, but the downside is that YARN won't be responsible for carving out CPU usage in a fixed manner.
See https://stackoverflow.com/a/43302303/3777211 for some discussion of changing to fair-scheduler to see the vcores allocation accurately represented in YARN. However, in your case there's probably no benefit to doing so; making YARN do bin-packing across both dimensions is more of a "shared multitenant cluster" issue, and only complicates the scheduling problem.
In your case, the best way to set your application up is just to ignore what YARN says about vcores; if you want just one executor per worker node, then set the executor memory size to the maximum that will fit in YARN per node, and make cores per executor equal to the total number of cores per node.

Related

How does the YARN container use the allocated CPU?

I am struggling to understand how yarn containers are limited to allocated resources, especially the CPU.
I am running Spark or Flink jobs in the YARN cluster. Each executor or task manager requests a yarn container that has 1 CPU. Basically, the number of containers is equal to the number of CPUs available in the host.
I understand that YARN monitors the memory usage, and if the container exceeds the limit, it sends a kill signal. I am wondering about how CPU scheduling really works.
My JVM job in the YARN container(1CPU) can try to create multiple CPU-bound work threads. Will JVM be limited to 1 CPU core to execute those threads, or will it steal resources from other containers? Can technically a YARN container affect other containers' CPU performance?
Let's say I have 10 CPU in the host and I created a single container. Will that containers CPU performance be 10% of the host CPU performance?
By Default, yarn only allocates resources by RAM. so by default it hopes everyone plays nicely and you can get affected by CPU hungry jobs. You can change this:
From Apache:
yarn.scheduler.capacity.resource-calculator The ResourceCalculator
implementation to be used to compare Resources in the scheduler. The
default i.e.
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator only
uses Memory while DominantResourceCalculator uses Dominant-resource to
compare multi-dimensional resources such as Memory, CPU etc. A Java
ResourceCalculator class name is expected.
In general it's enough to estimate by Memory. Most people actually estimate they're requirements for memory and threads very poorly. It's usually best to ignore [threads] unless you encounter issues. If it maintains to be an issue then maybe consider looking at DominantResourceCalculator. If/when you turn on resourceDominantCalculator be ready for a lot of people to feel the impact. You may have grossly over allocated threads and when we start counting threads, they will suddenly have to account for what they've asked for. (Or at least this was my experience.) This could grossly appear to shrink capacity of your cluster as space is reserved where it wasn't before.
TLDF: Don't touch this unless you have a good reason. (Wait until it's a problem, don't optimize until there is a bottleneck ). Users can make innocent mistakes in their resource estimation and it can be painful to grow their ability to correctly estimate what they need.

Hazelcast is using high number of JVM threads

I am using hazelcast in JVM in my application which is running 2 replicas in kubernetes. Hazelcast in both comtainers has formed a cluster and sync is working perfectly fine.
But my application has started using 20% more threads after starting to using hazelcast. On aanalyzing thread dump it is found that hazelcast is using that extra 20%
Is it okay for hazelcast to use this many number of threads or if this can be reduced, how can I go about this?
Hazelcast will self-size the number of threads it uses, based on the number of processors available to it.
(In Java, see Runtime.availableProcessors() )
How many does your container have allocated ?
You can override the threading if you are sure it's inappropriate. Look for system properties like hazelcast.*.thread.count from here. There are many options and it's not a casual task just to reduce or increase, If you tune numbers down, you risk performance being very poor.

Usages of cores in Spark SQL Execution

I am new to Spark SQL queries and trying to understand it's working under the hood.
I have come across the term "Core" in the Spark vocabulary but still struggling to get a hold on the same.
I know that - 1 core = 1 task.
My questions -
Can anyone please explain what exactly does a core mean ?
Does Spark UI show the number of cores currently allocated for my job ? If yes,
then where can I see it ?
If I find in the Spark UI that the number of tasks running is less, is
there a way to increase the number of cores allocated for my job, so
that Spark can submit more tasks and make my job run faster ?
Please advise.
Yes, you are right in a way.
In spark task are distributed across executors, on each executor number of task running is equal to the number of cores on that executors. So basically core is something that is going to execute your task. The task here is the most granular work that needs to be carried out.
JOB=>STAGE=>TASK
Yes, spark UI shows you the number of the task currently running on your every executor. You can check them under the Executors tab. This tab shows you a very detailed view of your task allocation against the number of cores available and a lot of other details.
Yes, you can increase the number of cores. You can do that by passing the argument in the spark-submit command.
--executor-cores n
Here n is the number of cores you want. For optimum usage, it should be 5.
It is not necessary that more than the number of cores faster your job will run.
Your task needs to be distributed equally across all the cores available to run faster.
If you provide more cores than required they will remain idle most of the time.

Stopping when the solution is good enough?

I successfully implemented a solver that fits my needs. However, I need to run the solver on 1500+ different "problems" at 0:00 precisely, everyday. Because my web-app is in ruby, I built a quarkus "micro-service" that takes the data, calculate a solution and return it to my main app.
In my application.properties, I set:
quarkus.optaplanner.solver.termination.spent-limit=5s
which means each request take ~5s to solve. But sending 1500 requests at once will saturate the CPU on my machine.
Is there a way to tell OptaPlanner to stop when the solution is good enough ? ( for example if the score is stable ... ). That way I can maybe reduce the time from 5s to 1-2s depending on the problem?
What are your recommandations for my specific scenario?
The SolverManager will automatically queue solver jobs if too many come in, based on its parallelSolverCount configuration:
quarkus.optaplanner.solver-manager.parallel-solver-count=3
In this case, it will run 3 solvers in parallel. So if 7 datasets come in, it will solve 3 of them and the other 4 later, as the earlier solvers terminate. However if you use moveThreadCount=2, then each solver uses at least 2 cpu cores, so you're using at least 6 CPU cores.
By default parallelSolverCount is currently set to half your CPU cores (it currently ignores moveThreadCount). In containers, it's important to use JDK 11+: the CPU count of the container is often different than from the bare metal machine.
You can indeed tell the OptaPlanner Solvers to stop when the solution is good enough, for example when a certain score is attained or the score hasn't improved in an amount of time, or combinations thereof. See these OptaPlanner docs. Quarkus exposes some of these already (the rest currently still need a solverConfig.xml file), some Quarkus examples:
quarkus.optaplanner.solver.termination.spent-limit=5s
quarkus.optaplanner.solver.termination.unimproved-spent-limit=2s
quarkus.optaplanner.solver.termination.best-score-limit=0hard/-1000soft

How to ensure multiple redis instances running on different cores?

I've a 4-core server and I want to run redis on it. To fully utilize the capabilities of the 4 cores, it is expected to launch 4 redis instances, since redis is designed to be single-threaded.
However, I'm curious how to ensure that the 4 instances are exactly running on 4 different cores? How can an instance decide the core on which it is running when it is launched?
Redis itself does not provide such guarantee.
If you launch 4 instances, there will be 4 different processes that the operating system will have to get scheduled on the 4 cores. It is up to the OS to perform this load balancing, optimizing the performance of the system.
Now, if you really want to bind each instance to a specific core, modern OS usually provides tools to enforce the execution of a process on a specific CPU core.
For instance, on Linux, you can have a look at the taskset and the numactl commands.
In practice, you need to be careful with this, because once you launch Redis on a specific core (setting a CPU mask), all the threads and child processes will inherit from this CPU mask. So when Redis will try to trigger a background save operation, or a background AOF rewrite, it will seriously impact the performance of the Redis instance. This is due to the fact the main Redis thread will have share the CPU core with the background operation (which is typically CPU consuming).
If you really want to play with CPU binding (but is it really a good idea?), you need to bind N Redis instances to N+1 CPU cores, keeping one core free for the background operations, and make sure at most one background operation can run at the same time for these instances.