Could anybody explain me the optimal configuration and parallelism for high performance flink jobs on YARN?
I use Cloudera Hadoop with 4 nodes (1 main node + 3 worker nodes) each has 12 CPU and 96 Gb memory.
There are few yarn properties
yarn.scheduler.maximum-allocation-mb - current value is 36Gb
yarn.nodemanager.resource.memory-mb - current value is 36Gb
I have found out that when I start yarn-session I should set -tm flag no more than 36Gb or my application will fail with error The cluster does not have the requested resources for the TaskManagers available! Maximum Memory: <...> Requested: <...>MB . Please check the 'yarn.scheduler.maximum-allocation-mb' and the 'yarn.nodemanager.resource.memory-mb' configuration values otherwise.
I want to use all of resources available on the cluster to increase performance of my flink jobs. So my questions are:
Should I set the above properties almost to 96 Gb (i.e. 88 Gb) and use 1 TaskManager per worker node with 12 slots (3 TaskManagers on 3 nodes, 36 slots total)? Is it common to use huge TaskManagers on YARN?
Or should I set yarn.nodemanager.resource.memory-mb to 88 Gb and yarn.scheduler.maximum-allocation-mb to 8 Gb and use multiple TaskManagers per one worker node? For example, 6 TaskManagers per node with 2 slots each (18 TaskManagers on 3 nodes with 36 slots total)? I have read that it is not recommended to set too high value for yarn.scheduler.maximum-allocation-mb.
Is it enougth to set -jm flag to 4 Gb for flink session? Are there any recomendations for this?
Please explain, how does it impact on network traffic, garbage collection, CPU and memory utilization, etc? Which configuration should I use for best performance?
Thank you for any help!
Related
We are using ElastiCache for Redis, and are confused by its Evictions metric.
I'm curious what the unit is on the evicted_keys metric from Redis Info? The ElastiCache docs say it is a count: https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/CacheMetrics.Redis.html but for our application we have observed the "Evictions" metric (which is derived from evicted_keys) fluctuates up and down, indicating it's not a count. I would expect a count to never decrease, since we cannot "un-evict" a key. I'm wondering if evicted_keys is actually a rate (eg, evictions/sec), which would explain why it can fluctuate.
Thanks you in advance for any responses!
From INFO command:
evicted_keys: Number of evicted keys due to maxmemory limit
To learn more about evictions see Using Redis as an LRU cache - Eviction policies
This counter is zero when the server starts, and it is only reset if you issue the CONFIG RESETSTAT command. However, on ElastiCache, this command is not available.
That said, ElastiCache derives the metric from this value, by calculating the difference between data-points.
Redis evicted_keys 0 5 12 18 22 ....
CloudWatch Evictions 0 5 7 6 4 ....
This is the usual pattern in CloudWatch metrics. This allows you to use SUM if you want the cumulative value, but also to detect rate changes or spikes easily.
Think for example you want to alarm if evictions are more than 10,000 over one minute period. If ElastiCache stores the cumulative value from Redis straight as a metric, this would be hard to accomplish.
Also, by committing the metric only as evicted keys for the period, you are protected of the data distortion of a server-reset or a value overflow. While the Redis INFO value would go back to zero, on ElastiCache you still get the value for the period and you can still do running sum over any period.
I am looking for suggestions on my RocksDB configuration. Our use case is to load 100GB of key-value pairs into rocksdb and at run time only serve the key-value pairs in the db. Key is 32 bytes and value is 1.6 KB in size.
What we have right now is we used hadoop to generate a 100GB sst file using SstFileWriter api and save it off in S3. Each new server that comes up ingests the file using: db.ingestExternalFile(..). We use a i3.large machine (15.25 GiB | 2 vCPUs | 475 GiB NVMe SSD). P95 and avg response from rocksdb given the current configuration:
BlockSize = 2KB
FormatVersion = 4
Read-Write=100% read at runtime
is ~1ms but P99 and PMAX are pretty bad. We are looking for some way to reduce the PMAX response times which is ~10x P95.
Thanks.
Can you load all of the db in memory using tmpfs ... basically just copy the data to RAM and see if that helps .... also it may make sense to compact the sst files in a separate job rather than ingesting at startup ... this may need a change in your machine config to be geared more towards RAM and less towards SSD storage
We found ourselves this problem. Config is as follows :-
Aerospike version : 3.14
Underlying hard disk : non-SSD
Variable Name Value
memory-size 5 GB
free-pct-memory 98 %
available_pct 4 %
max-void-time 0 millisec
stop-writes 0
stop-writes-pct 90 %
hwm-breached true
default-ttl 604,800 sec
max-ttl 315,360,000 sec
enable-xdr false
single-bin false
data-in-memory false
Can anybody please help us out with this ? What could be a potential reason for this ?
Aerospike only writes to free blocks. A block may contain any number of records that fit. If your write/update pattern is such that a block never falls below 50% active records(the default threshold for defragmenting: defrag-lwm-pct), then you have a bunch of "empty" space that can't be utilized. Read more about defrag in the managing storage page.
Recovering from this is much easier with a cluster that's not seeing any writes. You can increase defrag-lwm-pct, so that more blocks are eligible and gets defragmented.
Another cause could be just that the HDD isn't fast enough to keep up with defragmentation.
You can read more on possible resolutions in the Aerospike KB - Recovering from Available Percent Zero. Don't read past "Stop service on a node..."
You are basically not defragging your perisistence storage device (75GB per node). From the snapshot you have posted, you have about a million records on 3 nodes with 21 million expired. So looks like you are writing records with very short ttl and the defrag is unable to keep up.
Can you post the output of few lines when you are in this state of:
$ grep defrag /var/log/aerospike/aerospike.log
and
$ grep thr_nsup /var/log/aerospike/aerospike.log ?
What is your write/update load ? My suspicion is that you are only creating short ttl records and reading, not updating.
Depending on what you are doing, increasing defrag-lwm-pct may actually make things worse for you. I would also tweak nsup-delete-sleep from 100 microseconds default but it will depend on what your log greps above show. So post those, and lets see.
(Edit: Also, from the fact that you are not seeing evictions even though you are above the 50% HWM on persistence storage means your nsup thread is taking a very long time to run. That again points to nsup-delete-sleep value needing tuning for your set up.)
I want to monitor my redis cache cluster on ElastiCache. From AWS/Elasticache i am able to get metrics like FreeableMemory and BytesUsedForCache. If i am not wrong BytesUsedForCache is the memory used by cluster(assuming there is only one node in cluster). I want to calculate percentage uses of memory. Can any one help me to get percentage of Memory uses in Redis.
We had the same issue since we wanted to monitor the percentage of ElastiCache Redis memory that is consumed by our data.
As you wrote correctly, you need to look at BytesUsedForCache - that is the amount of memory (in bytes) consumed by the data you've stored in Redis.
The other two important numbers are
The available RAM of the AWS instance type you use for your ElastiCache node, see https://aws.amazon.com/elasticache/pricing/
Your value for parameter reserved-memory-percent (check your ElastiCache parameter group). That's the percentage of RAM that is reserved for "nondata purposes", i.e. for the OS and whatever AWS needs to run there to manage your ElastiCache node. By default this is 25 %. See https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/redis-memory-management.html#redis-memory-management-parameters
So the total available memory for your data in ElastiCache is
(100 - reserved-memory-percent) * instance-RAM-size
(In our case, we use instance type cache.r5.2xlarge with 52,82 GB RAM, and we have the default setting of reserved-memory-percent = 25%.
Checking with the info command in Redis I see that maxmemory_human = 39.61 GB, which is equal to 75 % of 52,82 GB.)
So the ratio of used memory to available memory is
BytesUsedForCache / ((100 - reserved-memory-percent) * instance-RAM-size)
By comparing the freeableMemory and bytesUsedForCache metrics, you will have the available memory for the Elasticache non-cluster mode (not sure if it applies to cluster-mode too).
Here is the NRQL we're using to monitor the cache:
SELECT Max(`provider.bytesUsedForCache.Sum`) / (Max(`provider.bytesUsedForCache.Sum`) + Min(`provider.freeableMemory.Sum`)) * 100 FROM DatastoreSample WHERE provider = 'ElastiCacheRedisNode'
This is based on the following:
FreeableMemory: The amount of free memory available on the host. This is derived from the RAM, buffers and cache that the OS reports as freeable.AWS CacheMetrics HostLevel
BytesUsedForCache: The total number of bytes allocated by Redis for all purposes, including the dataset, buffers, etc. This is derived from used_memory statistic at Redis INFO.AWS CacheMetrics Redis
So BytesUsedForCache (amount of memory used by Redis) + FreeableMemory (amount of data that Redis can have access to) = total memory that Redis can use.
With the release of the 18 additional CloudWatch metrics, you can now use DatabaseMemoryUsagePercentage and see the percentage of memory utilization in redis.
View more about the metric in the memory section here
You would have to calculate this based on the size of the node you have selected. See these 2 posts for more information.
Pricing doc gives you the size of your setup.
https://aws.amazon.com/elasticache/pricing/
https://forums.aws.amazon.com/thread.jspa?threadID=141154
I am new in hadoop and i am not yet familiar to its configuration.
I just want to ask the maximum container per node.
I am using a single node cluster (6GB ram laptop)
and below is my mapred and yarn configuration:
**mapred-site.xml**
map-mb : 4096 opts:-Xmx3072m
reduce-mb : 8192 opts:-Xmx6144m
**yarn-site.xml**
resource memory-mb : 40GB
min allocation-mb : 1GB
The above setup can only run 4 to 5 jobs. and max of 8 container.
Maximum containers that run on a single NodeManager (hadoop worker) depends on lot of factors like how much memory is assigned for the NodeManager to use and also depends on application specific requirements.
The defaults for yarn.scheduler.*-allocation-* are: 1GB (minimum allocation), 8GB (maximum allocation), 1 core and 32 cores. So, minimum and maximum allocation, affects number of containers per node.
So, if you have 6GB RAM and 4 virtual cores, here is how the YARN configuration should look like:
yarn.scheduler.minimum-allocation-mb: 128
yarn.scheduler.maximum-allocation-mb: 2048
yarn.scheduler.minimum-allocation-vcores: 1
yarn.scheduler.maximum-allocation-vcores: 2
yarn.nodemanager.resource.memory-mb: 4096
yarn.nodemanager.resource.cpu-vcores: 4
The above configuration tells hadoop to use atmost 4GB and 4 virtual cores and that each container can have between 128 MB and 2 GB of memory and between 1 and 2 virtual cores, with these settings you could run upto 2 containers with maximum resources at a time.
Now, for MapReduce specific configuration:
yarn.app.mapreduce.am.resource.mb: 1024
yarn.app.mapreduce.am.command-opts: -Xmx768m
mapreduce.[map|reduce].cpu.vcores: 1
mapreduce.[map|reduce].memory.mb: 1024
mapreduce.[map|reduce].java.opts: -Xmx768m
With this configuration, you could theoretically have up to 4 mappers/reducers running simultaneously in 4 1GB containers. In practice, the MapReduce application master will use a 1GB container so the actual number of concurrent mappers and reducers will be limited to 3. You can play around with the memory limits but it might require some experimentation to find the best ones.
As a rule of thumb, you should limit the heap-size to about 75% of the total memory available to ensure things run more smoothly.
You could also set memory per container using yarn.scheduler.minimum-allocation-mb property.
For more detail configuration for production systems use this document from hortonworks as a reference.