Kubernetes - Benefit of Reducing pods - optimization

What would be the main benefits of reducing the number of replicas:
deployment:
replicaCount: 100
maxReplicaCount: 1000
rollingUpdate:
maxSurge: 150
Throughput (rpm) is not that big and I'm planning to reduce replica count.

in my case who's in dev stage, you can save up some resource for your prioritized services or spin up new deployment for new features.

Related

redis cluster total size

I have a quick question about redis cluster.
I'm setting up a redis cluster on google cloud kubernetes engine. I'm using the n1-highmem-2 machine type with 13GB RAM, but I'm slightly confused how to calculate the total available size of the cluster.
I have 3 nodes with each 13GB ram. I'm running 6 pods (2 on each node), 1 master and 1 slave per node. This all works. I've assigned 6GB of RAM to each pod in my pod definition yaml file.
Is it correct to say that my total cluster size would be 18GB (3 masters * 6GB), or can I count the slaves size with the total size of the redis cluster?
Redis Cluster master-slave model
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
So, slaves are replicas(read only) of masters(read-write) for availability, hence your total workable size is the size of your master pods.
Keep in mind though, that leaving masters and slaves on the same Kubernetes node only protects from pod failure, not node failure and you should consider redistributing them.
You didn't mention how are you installing Redis, But I'd like to mention Bitnami Redis Helm Chart as it's built for use even on production and deploys 1 master and 3 slaves providing good fail tolerance and have tons of configurations easily personalized using the values.yaml file.

Hive LLAP low Vcore allocation

Problem Statment:
Hive LLAP Daemons not consuming Cluster VCPU allocation. 80-100 cores available for LLAP daemon, but only using 16.
Summary:
I am testing Hive LLAP on Azure using 2 D14_v2 head nodes, 16 D14_V2 Worker Nodes, and 3 A series Zookeeper nodes. (D14_V2 = 112GB Ram/12vcpu)
The 15 nodes of the 16 node Cluster is dedicated to LLAP
The Distribution is HDP 2.6.3.2-14
Currently the cluster has a total of 1.56TB of Ram Available and 128vcpu. The LLAP Daemons are allocated the proper amount of memory, but the LLAP Daemons only uses 16vcpus total ( 1 vcpu per daemon + 1 vcpu for slider).
Configuration:
My relevant hive configs are as follows:
hive.llap.daemon.num.executors = 10 (10 of the 12 available vcpu per
node)
Yarn Max Vcores per container - 8
Other:
I have been load testing the cluster but unable to get any more vcpus engaged in the process. Any thoughts or insights would be greatly appreciated.
Resource Manager UI will only show you query co-ordinator and slider's core and memory allocation, each query co-ordinator in LLAP occupy 1 core and mininum alloted Tez-AM memory (tez.am.resource.memory.mb). To check realtime core usage by LLAP service for HDP 2.6.3 version, follow below steps:
Ambari -> Hive -> Quick Links -> Grafana -> Hive LLAP overview ->
Total Execution Slots

How fast can ECS fargate boot a container?

What the the minimum/average time for AWS ECS Fargate to boot and run a docker image?
For arguments sake, the 45MB anapsix/alpine-java image.
I would like to investigate using ECS Fargate to speed up the process of building software locally on a slow laptop/pc, by having the software built on a faster remote server.
As such the boot up time of the image is crucial in making the endevour worth while.
I would disagree with the accepted answer given my experience with Fargate.
I have launched 1000's of containers on Fargate, and was even featured in an AWS architecture blog for our usage of Fargate. https://aws.amazon.com/blogs/architecture/building-real-time-ai-with-aws-fargate/
Private subnets, behind a NAT gateway have no different launch times for us than containers behind an IGW. If you use single NAT instances sure, your mileage may vary.
Container launch times in Fargate are entirely determined by how large your container is. Fargate does not cache containers, so every run task results in a docker pull happening. If your images are based on Ubuntu, you will have a bad time.
We have a mix of GO from scratch containers and Alpine node containers.
On average based on the metrics we have aggregated from 1000's of launches, From scratch containers start and are healthy in the target group in 10-15 seconds.
Alpine containers take on average 30-40 seconds to launch and become healthy.
Anything longer than that and your containers are likely too large for Fargate to make any sense until they offer pre cached ecr or something similar.
For your specific example, we have similar sized containers, if your entrypoint is healthy quickly (Ie not a 60 second java start time), your container of 45mb should launch and be ready to go in 30-60 seconds.
I am still waiting for caching in Fargate that is already available in ECS+EC2. This feature request can be tracked here. It is a pain in the ass that containers take such a long time to boot on AWS Fargate. Google Cloud Platform already offers this feature as generally available with a managed Cloud Run (K8s) environment, where containers spin up on the fly (~ 2 seconds) when they receive a request. They go idle after (a configurable) 5 minutes, which causes you to only be billed for those 5 minutes.
AWS Fargate does not offer such a nice feature of "warm containers" yet, although I would highly recommend them in doing so. It is probably technically difficult in getting compute and storage close together to accomplish this, it would require an enormous amount of internal bandwidth to load those containers as fast as Google does.
Nevertheless, below is my experience with Docker containers on AWS Fargate. Boot time is highly correlated with container image size as you can see from the following sample of containers I booted (February 2019):
4000 MB ~ 5 minutes
2400 MB ~ 4 minutes
1000 MB ~ 2 minutes
350 MB ~ 50 seconds
I would recommend building your container image on a light-weight base image, such as Minideb or Alpine. This would make your container image pretty small, ranging from a few 10MBs to a few 100MBs. But then again, when you need a JVM or Python with some additional packages and c-libs, you would easily go to 1000 MB.
I've launched more than 100 containers now in Fargate and on a public VPC they take about 4 mins on average, but I've seen it as long as 7-8 mins on a bad day.
If you launch it on a Private VPC then the timing can go south in a hurry. I've seen it take 2 hours to launch a Fargate container if the NAT instance is overloaded.
Hopefully AWS will speed this up over time. It shouldn't take me longer to launch a Fargate container than it does to upload my docker image to ECR.
One could use ECS_IMAGE_PULL_BEHAVIOR = prefer-cached on EC2 launch type to reduce agent start up timings to great extent.

Yarn Resource manager not utilising entire cluster resources

We recently moved the Yarn resource manager daemon to a cloud machine from a static machine. Same configurations are being used. yarn-site.xml and capacity-scheduler.xml as earlier with the machine name change included.
Rest of the cluster is a mix of both static and cloud machines.
The resource manager UI shows all nodes as live and available capacity of memory and vcores is accurately reported.
root.default stats: default: capacity=0.9,
absoluteCapacity=0.9,
usedResources=<memory:12288, vCores:6>
usedCapacity=0.043290164,
absoluteUsedCapacity=0.038961038,
numApps=7,
numContainers=6
Queue configuration details :
Queue State: RUNNING
Used Capacity: 3.6%
Absolute Used Capacity: 3.2%
Absolute Capacity: 90.0%
Absolute Max Capacity: 95.0%
Used Resources: <memory:10240, vCores:5>
Num Schedulable Applications: 5
Num Non-Schedulable Applications: 0
Num Containers: 5
Max Applications: 9000
Max Applications Per User: 9000
Max Schedulable Applications: 29260
Max Schedulable Applications Per User: 27720
Configured Capacity: 90.0%
Configured Max Capacity: 95.0%
Configured Minimum User Limit Percent: 100%
Configured User Limit Factor: 1.0
Active users: thirdeye <Memory: 10240 (100.00%), vCores: 5 (100.00%), Schedulable Apps: 5, Non-Schedulable Apps: 0>
At any point of time only 4-6 containers are being allocated. These are being allocated both on static and cloud node managers of the cluster.
What could be the reason for the under utilization of the cluster resources.
Submitted jobs are getting piled up(right now at 7K)

WSO2 ESB High Availability/ clustering Environment system requirements

I want information on the WSO2 ESB clustering system requirements for production deployment on Linux.
Went through the following link :ESB clustering
Understand that more than 1 copy of the WSO2 ESB would be extracted and set up on single server for Worker nodes and similarly on the other server for Manager (DepSyn and admin) , worker nodes .
Can someone suggest what would be the system requirements of each server in this case ?
system prerequisites link suggests
Memory - 2 GB , 1 GB Heap size
Disk - 1 GB
assuming to handle one ESB instance (worker or manager node).
Thanks in advance,
Sai.
As a minimum, the system requirement would be 2 GB for the ESB worker JVM (+appropriate memory for the OS: assume 2GB for Linux in this case) which would be 4 GB minimum. Of course based on the type of work done and load, this requirement might increase.
The worker manager separation is for the separation of concerns. Hence in a typical production deployment, you might have a single manager node (same specs) and 2 worker nodes where only the worker nodes would handle traffic.