I am trying to configure vcores per container for tez task. I have edited the property tez.task.resource.cpu.vcores but tez is not taking the value, it always allocates one vcore per requested container.
How to increase vcores per container in tez?
Related
When we are trying to extract a large table from a sql server, we are getting an error:
Containerized process terminated by signal 119.
As per my understanding, kubernetes containers have a limit of how many GB is allocated to memory for each POD.
So suppose if we have a limitation on memory and the table size is expected to be larger then what is the option we have?
A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.
[source]
There are two possible reasons:
Your container exceeds it's memory limits set in spec.containers[].resources.limits.memory field; or
Your container exceeds node available memory.
In the first case you can increase memory limit by changing spec.containers[].resources.limits.memory value.
In the second case you can either increase node's resources or make sure the pod is being scheduled on a node with more available memory.
I tried to limit the memory of my pod
resources:
requests:
cpu: 2000m
memory: 100Mi
limits:
cpu: 2000m
memory: 140Mi
However, if I use kubectl describe nodes I still get allocated a 2vCPU, 16G memory node.
Looks like the value of memory is invalid. From the AWS documentation Fargate rounds up to the compute configuration shown below that most closely matches the sum of vCPU and memory requests in order to ensure pods always have the resources that they need to run. (reference here)
You are defining a 2 vCPU with 140 Mebibyte of memory, that are way less that the 4GB minimum for that level of CPUs (4G = 3817Mi, you can run conversion here)
I personally expect, reading AWS configuration, that a pod with 2 vCPUs and 4GB of RAM is set in place. But maybe the 140Mi is considered invalid and it's round up to the maximum value for that range.
Maybe you were meaning 14000Mi (so 14.6 Gigabytes) of RAM?
The output of kubectl describe nodes is not what counts in this case.
AWS sets an annotation "CapacityProvisioned" to the pod which describes the used instance size. The annotations are displayed in the UI Console under your cluster, Workloads, then Pods on the bottom right.
It is possible a node larger than requested is used, however, you are still limited to the requested resources.
Source: https://github.com/aws/containers-roadmap/issues/942#issuecomment-747416514
I have a Docker image that is running as a Fargate task. I am curious to know how AWS bills for the use of it. Currently I have a hard limit of 1GB and a soft limit of 512MB. If I bump the hard limit up to 2GB to avoid memory issue in certain cases, will I be charged for 2GB all the time or only the period that the container needs it? For most of time my application does not even need 512MB but occasionally it needs 2GB.
Visit here for pricing details
https://aws.amazon.com/fargate/pricing/
The lowest vCPU is 0.25 which provides memory upto 2 GB and is charged based on the CPU utilized.
We have a cluster running Hadoop and YARN on AWS EMR with one core and one master, each with 4 vCores, 32 GB mem, 32 GB disc. We only have one long-running YARN application, and within that, there are only one or two long-running Flink applications, each with a parallelism of 1. Checkpointing has a 10 minute interval with a minimum of 5 minutes between. We use EventTime with a window of 10 minutes and a watermark duration of 15 seconds. The state is stored in S3 through the FsStateBackend with async snapshots enabled. Exactly-Once checkpointing is enabled as well.
We have UUIDs set up for all operators but don't have HA set up for YARN or explicit max parallelism for the operators.
Currently, when restoring from a checkpoint (3GB) the processing holds at the windowing until a org.apache.flink.util.FlinkException: The assigned slot <container_id> was removed error is thrown during the next checkpoint. I have seen that all but the operator with the largest state (which is a ProcessFunction directly after the windowing), finish checkpointing.
I know it is strongly suggested to use RocksDB for production, but is that mandatory for a state that most likely won't exceed 50GB?
Where would be the best place to start addressing this problem? Parallelism?
I remember some recent version of YARN has a configuration parameter which controls the amount of memory (or cores) a job can use. I tried to find it from the Web but I couldn't yet. If you know the parameter, please let me know.
I know one way to go about this is to use some kind of scheduler but for now I need a job level control so that the job doesn't abuse the entire system.
Thanks!
You can control maximum and minimum resource which are allocated to each containers.
yarn.scheduler.minimum-allocation-mb: Minimum memory allocation for each container
yarn.scheduler.maximum-allocation-mb: Maximum memory allocation for each container
yarn.scheduler.minimum-allocation-vcores: Minimum core allocation for each container
yarn.scheduler.maximum-allocation-vcores: Maximum core allocation for each container
If you want to avoid abuse of user jobs, yarn.scheduler.maximum-allocation-* can be solution because RM refuses the request which requires above these restriction by throwing InvalidResourceRequestException.
ref: yarn-default.xml