Hive LLAP low Vcore allocation - hive

Problem Statment:
Hive LLAP Daemons not consuming Cluster VCPU allocation. 80-100 cores available for LLAP daemon, but only using 16.
Summary:
I am testing Hive LLAP on Azure using 2 D14_v2 head nodes, 16 D14_V2 Worker Nodes, and 3 A series Zookeeper nodes. (D14_V2 = 112GB Ram/12vcpu)
The 15 nodes of the 16 node Cluster is dedicated to LLAP
The Distribution is HDP 2.6.3.2-14
Currently the cluster has a total of 1.56TB of Ram Available and 128vcpu. The LLAP Daemons are allocated the proper amount of memory, but the LLAP Daemons only uses 16vcpus total ( 1 vcpu per daemon + 1 vcpu for slider).
Configuration:
My relevant hive configs are as follows:
hive.llap.daemon.num.executors = 10 (10 of the 12 available vcpu per
node)
Yarn Max Vcores per container - 8
Other:
I have been load testing the cluster but unable to get any more vcpus engaged in the process. Any thoughts or insights would be greatly appreciated.

Resource Manager UI will only show you query co-ordinator and slider's core and memory allocation, each query co-ordinator in LLAP occupy 1 core and mininum alloted Tez-AM memory (tez.am.resource.memory.mb). To check realtime core usage by LLAP service for HDP 2.6.3 version, follow below steps:
Ambari -> Hive -> Quick Links -> Grafana -> Hive LLAP overview ->
Total Execution Slots

Related

EC2 VM running Standalone Confluent S3 sink connector has a difference of 36 GB between NetworkIn & NetworkOut values

I have a EC2 VM running Confluent's s3 sink connector in standalone mode for benchmarking a MSK serverless cluster.
Network in for this VM over 30 minutes = 112.4GB
Network out for this VM over 30 minutes= 75.8GB
s3 sink size over 30 minutes=74 GB
I'm unable to explain the difference of 36.6GB between what the VM is
ingesting from the MSK serverless cluster and what it is persisting
to the S3 bucket.
The VM is a m5.4xlarge instance with 56Gb heap size and 40% CPU
utilization over the course of the run, so it can't be a lack of
compute or memory capacity. This process is also the sole tenant of this vm. I'm using ssh on my local machine to start and stop the connector on the EC2 instance.
The data is being produced by the Confluent Datagen connector running
on a separate VM in standalone mode with same stats. The networkOut for the producer
matches the networkIn for this s3 sink VM.
The bucket is in the same region as the EC2 instance and the MSK
serverless cluster. I'm even using a S3 endpoint gateway.
The topic the connector reads from has 100 partitions 3 replication factor and 2 ISRs. My consumer lag stats are
SumOffsetLag=1.15M
EstimatedMaxTimelag=18.5s
maxOffSetLag=37.7K
This the configuration I'm using for the s3 sink connector.
format.class=io.confluent.connect.s3.format.json.JsonFormat
flush.size=50000
rotate.interval.ms=-1
rotate.schedule.interval.ms=-1
s3.credentials.provider.class=com.amazonaws.auth.DefaultAWSCredentialsProviderChain
storage.class=io.confluent.connect.s3.storage.S3Storage
schema.compatibility=NONE
partitioner.class=io.confluent.connect.storage.partitioner.DefaultPartitioner
consumer.override.security.protocol=SASL_SSL
consumer.override.sasl.mechanism=AWS_MSK_IAM
consumer.override.sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
consumer.override.sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
schemas.enable=false
connector.class=io.confluent.connect.s3.S3SinkConnector
time.interval=HOURLY ```

redis cluster total size

I have a quick question about redis cluster.
I'm setting up a redis cluster on google cloud kubernetes engine. I'm using the n1-highmem-2 machine type with 13GB RAM, but I'm slightly confused how to calculate the total available size of the cluster.
I have 3 nodes with each 13GB ram. I'm running 6 pods (2 on each node), 1 master and 1 slave per node. This all works. I've assigned 6GB of RAM to each pod in my pod definition yaml file.
Is it correct to say that my total cluster size would be 18GB (3 masters * 6GB), or can I count the slaves size with the total size of the redis cluster?
Redis Cluster master-slave model
In order to remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-slave model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional slaves nodes).
So, slaves are replicas(read only) of masters(read-write) for availability, hence your total workable size is the size of your master pods.
Keep in mind though, that leaving masters and slaves on the same Kubernetes node only protects from pod failure, not node failure and you should consider redistributing them.
You didn't mention how are you installing Redis, But I'd like to mention Bitnami Redis Helm Chart as it's built for use even on production and deploys 1 master and 3 slaves providing good fail tolerance and have tons of configurations easily personalized using the values.yaml file.

RabbitMQ problems with node on cluster

I have some problems with my RabbitMQ HA cluster.
Problems are next:
I have 3 nodes in cluster.
Node 2 and 3 joined with node 1.
When I have load - it goes to node 1 and almost all RAM is used.
If I switch nodes all load goes to the next node but RAM usage is less than on node 1.
Memory investigations shows that all RAM in this moment is used by RabbitMQ binary, but binary at the same time uses only 1 GB of memory, but allocated 5 GB.
If I switch nodes back - 1 node back uses more RAM than other nodes.
What is the problem in this cases?
Can anybody help me to solve this issue?
If you need more information or screenshots I can send them to you.
RabbitMQ 3.6.10
Erlang 20.3
Traffic to RabbitMQ goes to it via HAProxy on the same server which is RabbitMQ located.

Yarn Resource manager not utilising entire cluster resources

We recently moved the Yarn resource manager daemon to a cloud machine from a static machine. Same configurations are being used. yarn-site.xml and capacity-scheduler.xml as earlier with the machine name change included.
Rest of the cluster is a mix of both static and cloud machines.
The resource manager UI shows all nodes as live and available capacity of memory and vcores is accurately reported.
root.default stats: default: capacity=0.9,
absoluteCapacity=0.9,
usedResources=<memory:12288, vCores:6>
usedCapacity=0.043290164,
absoluteUsedCapacity=0.038961038,
numApps=7,
numContainers=6
Queue configuration details :
Queue State: RUNNING
Used Capacity: 3.6%
Absolute Used Capacity: 3.2%
Absolute Capacity: 90.0%
Absolute Max Capacity: 95.0%
Used Resources: <memory:10240, vCores:5>
Num Schedulable Applications: 5
Num Non-Schedulable Applications: 0
Num Containers: 5
Max Applications: 9000
Max Applications Per User: 9000
Max Schedulable Applications: 29260
Max Schedulable Applications Per User: 27720
Configured Capacity: 90.0%
Configured Max Capacity: 95.0%
Configured Minimum User Limit Percent: 100%
Configured User Limit Factor: 1.0
Active users: thirdeye <Memory: 10240 (100.00%), vCores: 5 (100.00%), Schedulable Apps: 5, Non-Schedulable Apps: 0>
At any point of time only 4-6 containers are being allocated. These are being allocated both on static and cloud node managers of the cluster.
What could be the reason for the under utilization of the cluster resources.
Submitted jobs are getting piled up(right now at 7K)

WSO2 ESB High Availability/ clustering Environment system requirements

I want information on the WSO2 ESB clustering system requirements for production deployment on Linux.
Went through the following link :ESB clustering
Understand that more than 1 copy of the WSO2 ESB would be extracted and set up on single server for Worker nodes and similarly on the other server for Manager (DepSyn and admin) , worker nodes .
Can someone suggest what would be the system requirements of each server in this case ?
system prerequisites link suggests
Memory - 2 GB , 1 GB Heap size
Disk - 1 GB
assuming to handle one ESB instance (worker or manager node).
Thanks in advance,
Sai.
As a minimum, the system requirement would be 2 GB for the ESB worker JVM (+appropriate memory for the OS: assume 2GB for Linux in this case) which would be 4 GB minimum. Of course based on the type of work done and load, this requirement might increase.
The worker manager separation is for the separation of concerns. Hence in a typical production deployment, you might have a single manager node (same specs) and 2 worker nodes where only the worker nodes would handle traffic.