How to limit the memory size of the instance of EKS fargate

How to limit the memory size of the instance of EKS fargate - amazon-eks

I tried to limit the memory of my pod
resources:
requests:
cpu: 2000m
memory: 100Mi
limits:
cpu: 2000m
memory: 140Mi
However, if I use kubectl describe nodes I still get allocated a 2vCPU, 16G memory node.

Looks like the value of memory is invalid. From the AWS documentation Fargate rounds up to the compute configuration shown below that most closely matches the sum of vCPU and memory requests in order to ensure pods always have the resources that they need to run. (reference here)
You are defining a 2 vCPU with 140 Mebibyte of memory, that are way less that the 4GB minimum for that level of CPUs (4G = 3817Mi, you can run conversion here)
I personally expect, reading AWS configuration, that a pod with 2 vCPUs and 4GB of RAM is set in place. But maybe the 140Mi is considered invalid and it's round up to the maximum value for that range.
Maybe you were meaning 14000Mi (so 14.6 Gigabytes) of RAM?

The output of kubectl describe nodes is not what counts in this case.
AWS sets an annotation "CapacityProvisioned" to the pod which describes the used instance size. The annotations are displayed in the UI Console under your cluster, Workloads, then Pods on the bottom right.
It is possible a node larger than requested is used, however, you are still limited to the requested resources.
Source: https://github.com/aws/containers-roadmap/issues/942#issuecomment-747416514

Related

Containerized process terminated by signal 119

When we are trying to extract a large table from a sql server, we are getting an error:
Containerized process terminated by signal 119.
As per my understanding, kubernetes containers have a limit of how many GB is allocated to memory for each POD.
So suppose if we have a limitation on memory and the table size is expected to be larger then what is the option we have?

A Container can exceed its memory request if the Node has memory available. But a Container is not allowed to use more than its memory limit. If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.
[source]
There are two possible reasons:
Your container exceeds it's memory limits set in spec.containers[].resources.limits.memory field; or
Your container exceeds node available memory.
In the first case you can increase memory limit by changing spec.containers[].resources.limits.memory value.
In the second case you can either increase node's resources or make sure the pod is being scheduled on a node with more available memory.

How does AWS charge for use of Fargate tasks?

I have a Docker image that is running as a Fargate task. I am curious to know how AWS bills for the use of it. Currently I have a hard limit of 1GB and a soft limit of 512MB. If I bump the hard limit up to 2GB to avoid memory issue in certain cases, will I be charged for 2GB all the time or only the period that the container needs it? For most of time my application does not even need 512MB but occasionally it needs 2GB.

Visit here for pricing details
https://aws.amazon.com/fargate/pricing/
The lowest vCPU is 0.25 which provides memory upto 2 GB and is charged based on the CPU utilized.

Redis stream 50k consumer support parallel - capacity requirement

What are the Redis capacity requirements to support 50k consumers within one consumer group to consume and process the messages in parallel? Looking for testing an infrastructure for the same scenario and need to understand considerations.

Disclaimer: I worked in a company which used Redis in a somewhat large scale (probably less consumers than your case, but our consumers were very active), however I wasn't from the infrastructure team, but I was involved in some DevOps tasks.
I don't think you will find an exact number, so I'll try to share some tips and tricks to help you:
Be sure to read the entire Redis Admin page. There's a lot of useful information there. I'll highlight some of the tips from there:
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set a high net.core.somaxconn (RabbitMQ suggests 4096). Check the documentation of tcp-backlog config in redis.conf for an explanation about this.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set vm.overcommit_memory = 1. Read below for a detailed explanation.
Assuming you'll set up a Linux host, edit /etc/sysctl.conf and set fs.file-max. This is very important for your use case. The Open File Handles / File Descriptors Limit is essentially the maximum number of file descriptors (each client represents a file descriptor) the SO can handle. Please check the Redis documentation on this. RabbitMQ documentation also present some useful information about it.
If you edit the /etc/sysctl.conf file, run sysctl -p to reload it.
"Make sure to disable Linux kernel feature transparent huge pages, it will affect greatly both memory usage and latency in a negative way. This is accomplished with the following command: echo never > /sys/kernel/mm/transparent_hugepage/enabled." Add this command also to /etc/rc.local to make it permanent over reboot.
In my experience Redis is not very resource-hungry, so I believe you won't have issues with CPU. Memory are directly related to how much data you intend to store in it.
If you set up a server with many cores, consider using more than one Redis Server. Redis is (mostly) single-threaded and will not use all your CPU resources if you use a single instance in a multicore environment.
Redis server also warns about wrong/risky configurations on startup (sorry for the old image):
Explanation on Overcommit Memory (vm.overcommit_memory)
Setting overcommit_memory to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis [from Redis FAQ]
There are three possible settings for vm.overcommit_memory.
0 (zero): Check if enough memory is available and, if so, allow the allocation. If there isn’t enough memory, deny the request and return an error to the application.
1 (one): Permit memory allocation in excess of physical RAM plus swap, as defined by vm.overcommit_ratio. The vm.overcommit_ratio parameter is a
percentage added to the amount of RAM when deciding how much the kernel can overcommit. For instance, a vm.overcommit_ratio of 50 and 1 GB of
RAM would mean the kernel would permit up to 1.5 GB, plus swap, of memory to be allocated before a request failed.
2 (two): The kernel’s equivalent of "all bets are off", a setting of 2 tells the kernel to always return success to an application’s request for memory. This is absolutely as weird and scary as it sounds.

Dask Yarn failed to allocate number of workers

We have a CDH cluster (version 5.14.4) with 6 worker servers with a total of 384 vcores (64 cores per server).
We are running some ETL processes using dask version 2.8.1, dask-yarn version 0.8 with skein 0.8 .
Currently we are having problem allocating the maximum number of workers .
We are not able to run a job with more the 18 workers! (we can see the actual number of workers in the dask dashboad.
The definition of the cluster is as follows:
cluster = YarnCluster(environment = 'path/to/my/env.tar.gz',
n_workers = 24,
worker_vcores = 4,
worker_memory= '64GB'
)
Even when increasing the number of workers to 50 nothing changes, although when changing the worker_vcores or worker_memory we can see the changes in the dashboard.
Any suggestions?
update
Following #jcrist answer I realized that I didn't fully understand the termenology between the Yarn web UI application dashboard and the Yarn Cluster parameters.
From my understanding:
a Yarn Container is equal to a dask worker.
When ever a Yarn cluster is generated there are 2 additional workers/containers that are running (one for a Schedualer and one for a logger - each with 1 vCore)
The limitation between the n_workers * worker_vcores vs. n_workers * worker_memory that I still need fully grok.
There is another issue - while optemizing I tried using cluster.adapt(). The cluster was running with 10 workers each with 10 ntrheads with a limit of 100GB but in the Yarn web UI there was only displayed 2 conteiners running (my cluster has 384 vCorres and 1.9TB so there is still plenty of room to expand). probably worth to open a different question.

There are many reasons why a job may be denied more containers. Do you have enough memory across your cluster to allocate that many 64 GiB chunks? Further, does 64 GiB tile evenly across your cluster nodes? Is your YARN cluster configured to allow jobs that large in this queue? Are there competing jobs that are also taking resources?
You can see the status of all containers using the ApplicationClient.get_containers method.
>>> cluster.application_client.get_containers()
You could filter on state REQUESTED to see just the pending containers
>>> cluster.application_client.get_containers(states=['REQUESTED'])
this should give you some insight as to what's been requested but not allocated.
If you suspect a bug in dask-yarn, feel free to file an issue (including logs from the application master for a problematic run), but I suspect this is more an issue with the size of containers you're requesting, and how your queue is configured/currently used.

EBS Volume from Ubuntu to RedHat

I would like to use an EBS volume with data on it that I've been working with in an Ubuntu AMI in a RedHat 6 AMI. The issue I'm having is that RedHat says that the volume does not have a valid partition table. This is the fdisk output for the unmounted volume.
Disk /dev/xvdk: 901.9 GB, 901875499008 bytes
255 heads, 63 sectors/track, 109646 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/xvdk doesn't contain a valid partition table
Interestingly, the volume isn't actually 901.9 GB but 300 GB.. I don't know if that means anything. I am very concerned about possibly erasing the data in the volume by accident. Can anyone give me some pointers for formatting the volume for RedHat without deleting its contents?
I also just checked that the volume works in my Ubuntu instance and it definitely does.

I'm not able to advise on the partition issue as such, other than stating that you definitely neither need nor want to format it, because formatting is indeed a (potentially) destructive operation. My best guess would be that RedHat isn't able to identify the file system currently in use on the EBS volume, which must be advertized by some means accordingly.
However, to ease with experimenting and gain some peace of mind, you should get acquainted with one of the major Amazon EBS features, namely to create point-in-time snapshots of volumes, which are persisted to Amazon S3:
These snapshots can be used as the starting point for new Amazon EBS
volumes, and protect data for long-term durability. The same snapshot
can be used to instantiate as many volumes as you wish.
This is detailed further down in section Amazon EBS Snapshots:
Snapshots can also be used to instantiate multiple new volumes, expand
the size of a volume or move volumes across Availability Zones. When a
new volume is created, there is the option to create it based on an
existing Amazon S3 snapshot. In that scenario, the new volume begins
as an exact replica of the original volume. [...] [emphasis mine]
Therefore you can (and actually should) always start experiments or configuration changes like the one you are about to perform by at least snapshotting the volume (which will allow you to create a new one from that point in time in case things go bad) or creating a new volume from that snapshot immediately for the specific task at hand.
You can create snapshots and new volumes from snapshots via the AWS Management Console, as usual there are respective APIs available as well for automation purposes (see API and Command Overview) - see Creating an Amazon EBS Snapshot for details.
Good luck!.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas