Slow Select Distinct with persistence - ignite

using 2.13.0., we are trying to generalize the use of ignite persistence but somehow we are doing something wrong. When the number of objects in the cache increases linearly the result of a "SELECT DISTINCT" using indexes increases exponentially. I prepared a small test to reproduce the problem https://github.com/hostettler/ignite-slow-query-persistence.
Without persistence:
655,360 objects in the caches => select distinct : 472s ---
x5 : 3,276,750 objects in the caches => select distinct : 2,141ms
x10 : 6,553,500 objects in the caches => select distinct : 4,354ms ---
x20 : Cannot run because I did not give enough off heap (objective is to test persistence)
With Persistence
655,360 objects in the caches => select distinct : 561 ms
x5 : 3,276,750 objects in the caches => select distinct : 3,056ms
x10 : 6,553,500 objects in the caches => select distinct : 6,381ms
x20 : 13,107,000 object => select distinct : 284,617 ms
0.5s for a select distinct on an index on 655k records is not fast at any rate anyway but the degradation is exponential between x10 and x20. The query is using the index as described by the explain plan.
SELECT DISTINCT
__Z0.DATAGROUPID AS __C0_0
FROM "cache".CONTRACTDTOSIMPLE __Z0
/* "cache".CONTRACTDTOSIMPLE_DATAGROUPID_IDX */
Already tried to tune all the persistence parameters but it seems the indexes are used.
The difference between x10 and x20 is around x50 slower. So relying on the persistence has a crazy impact on performance. Any idea of what I am missing?
Thanks a lot in advance
P.S: I tried the calcite engine and the results are even worse.
----- EDIT -----
I did more test on an Azure D8ds v5 (same repository), Let me put the results here:
offheap is set to 4GB, on heap is set to 4GB
Volume
Persistence enabled
WAL MODE
Throttling enabled
Page Replacement
Insert time
SQL Query
Partitioned SQL
Partitioned Scan Query
6.5M
false
N/A
N/A
N/A
96.5s
3.4s
2.6s
0.7s
6.5M
true
NONE
false
CLOCK
192s
5.6s
3.9s
1.1s
13M
true
NONE
false
CLOCK
1063s
197s
188s
94s
13M
true
NONE
true
CLOCK
1047s
196s
102s
14s
13M
true
NONE
true
SEGMENTED_LRU
1042s
211s
205s
14s
13M
true
LOG_ONLY
false
CLOCK
1575s
225s
188s
14s
13M
true
LOG_ONLY
true
CLOCK
1586s
185s
102s
15s
13M
true
LOG_ONLY
true
SEGMENTED_LRU
1550s
259s
150s
15s

Related

MariaDB Server Optimization - reducing I/O operation

I need help with MariaDB server optimization. I have a lot I/O operations (created tmp disk tables) and I want to reduce it.
Hardware: CPU 20 x 2197mHz, RAM 50 Gb, SSD disks RAID 10
Software: 10.1.26-MariaDB-0+deb9u1 - Debian 9.1
Server handles Wordpress databases (~1500).
Stats
Warnings
Config:
key_buffer_size = 384M
max_allowed_packet = 5096M
thread_stack = 192K
thread_cache_size = 16
myisam_recover_options = BACKUP
max_connections = 200
table_cache = 12000
max_connect_errors = 20
open_files_limit = 30000
wait_timeout = 3600
interactive_timeout = 3600
query_cache_type = 0
query_cache_size = 0
query_cache_limit = 0
join_buffer_size = 2M
tmp_table_size = 1G
max_heap_table_size = 1G
table_open_cache = 15000
innodb_buffer_pool_size = 35G
innodb_buffer_pool_instances = 40
Are you using MyISAM? If you are you shouldn't be. Convert any MyISAM tables outside the mysql schema to InnoDB and set key_buffer_size to 1M.
max_allowed_packet = 5G is absurdly high.
thread_cache_size = 0 is the recommended default. Unless you really know what you are doing and have measurements to back it up, you should leave it alone.
join_buffer_size is another setting you should almost certainly not be touching.
max_heap_table_size = 1G is almost certainly too large - if you are getting temporary tables that big created in memory with any regularity, your server will run out of memory, grind to a halt and OOM anyway.
Yes, follow Gordan's advice.
Improve WP's indexing by following the advice here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#speeding_up_wp_postmeta
Turn on the slowlog to identify what queries are the worst: http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog

Elastalert rule for CPU usage in percentage

I am facing issue with elastalert rule for CPU usage (not load average). I am not getting any hit and match. Below is my .yaml file for CPU rule:
name: CPU usgae
type: metric_aggregation
index: metricbeat-*
buffer_time:
minutes: 10
metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname
doc_type: doc
bucket_interval:
minutes: 5
sync_bucket_interval: true
max_threshold: 60.0
filter:
- term:
metricset.name: cpu
alert:
- "email"
email:
- "xyz#xy.com"
Can you please help me what changes i need to make in my rule.
Any assistance will be appreciated.
Thanks.
Metricbeat reports CPU values in the range of 0 to 1. So a threshold of 60 will never be matched.
Try it with max_threshold: 0.6 and it probably will work.
Try reducing buffer_time and bucket_interval for testing
The best way to debug elastalert issue is by using command line option --es_debug_trace like this (--es_debug_trace /tmp/output.txt). It shows exact curl api call to elasticsearch being used by elastalert in background. Then the query can be copied and used in Kibana's Dev Tools for easy analysis and fiddling.
Most likely, doc_type: doc setting might have caused the ES endpoint to look like this: metricbeat-*/doc/_search
You might not have that doc document hence no match. Please remove doc_type and try.
Also please note that the pct value is less than 1 hence for you case: max_threshold: 0.6
For me following works, for your reference:
name: CPU usage
type: metric_aggregation
use_strftime_index: true
index: metricbeat-system.cpu-%Y.%m.%d
buffer_time:
hour: 1
metric_agg_key: system.cpu.total.pct
metric_agg_type: avg
query_key: beat.hostname
min_doc_count: 1
bucket_interval:
minutes: 5
max_threshold: 0.6
filter:
- term:
metricset.name: cpu
realert:
hours: 2
...
sample match output:
{
'#timestamp': '2021-08-19T15:06:22Z',
'beat.hostname': 'MY_BUSY_SERVER',
'metric_system.cpu.total.pct_avg': 0.6155,
'num_hits': 50,
'num_matches': 10
}

arules apriori command hanging or just taking long

I used apriori command from the arules package on a transaction object and one the CPU's went up to about 97% for 20 or so min. Then it went down to cycling between 0.7 % and 0.3% and has been doing that for about 24 hr. and I do not have the prompt back in RStudio; just blinking. I have 2666 transactions and 376 items. There is probably a lot similarity among some of the transactions, meaning that some transactions can share over 100 items.
This is the first time I have used this package, so I was wondering if this behavior was normal or what should I do.
I am running on CentOS 7 with 24 GB RAM and 16 CPU's, and using RStudio Server.
My command:
rules <- apriori(adjacdmMtrans, parameter =list(support = 0.002, confidence=0.75))
Some info put out by arules after entering the above command:
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport support minlen maxlen target ext
0.75 0.1 1 none FALSE TRUE 0.002 1 10 rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 5
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[376 item(s), 2666 transaction(s)] done [0.03s].
sorting and recoding items ... [376 item(s)] done [0.01s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3 4
Your machine probably runs out of memory for the R process and starts swapping. In the worse case, apriori has to create for 376 items in the order of 10^12 candidates of length 4. Start with a higher support value.

Wrong balance between Aerospike instances in cluster

I have an application with a high load for batch read operations. My Aerospike cluster (v 3.7.2) has 14 servers, each one with 7GB RAM and 2 CPUs in Google Cloud.
By looking at Google Cloud Monitoring Graphs, I noticed a very unbalanced load between servers: some servers have almost 100% CPU load, while others have less than 50% (image below). Even after hours of operation, the cluster unbalanced pattern doesn't change.
Is there any configuration that I could change to make this cluster more homogeneous? How to optimize node balancing?
Edit 1
All servers in the cluster have the same identical aerospike.conf file:
Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
#address any
port 3000
}
heartbeat {
mode mesh
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 3
memory-size 5G
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 180G
}
}
Edit 2:
$ asinfo
1 : node
BB90600F00A0142
2 : statistics
cluster_size=14;cluster_key=E3C3672DCDD7F51;cluster_integrity=true;objects=3739898;sub-records=0;total-bytes-disk=193273528320;used-bytes-disk=26018492544;free-pct-disk=86;total-bytes-memory=5368709120;used-bytes-memory=239353472;data-used-bytes-memory=0;index-used-bytes-memory=239353472;sindex-used-bytes-memory=0;free-pct-memory=95;stat_read_reqs=2881465329;stat_read_reqs_xdr=0;stat_read_success=2878457632;stat_read_errs_notfound=3007093;stat_read_errs_other=0;stat_write_reqs=551398;stat_write_reqs_xdr=0;stat_write_success=549522;stat_write_errs=90;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=4;stat_rw_timeout=1862;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=7021;stat_proxy_reqs_xdr=0;stat_proxy_success=2121;stat_proxy_errs=4739;stat_ldt_proxy=0;stat_cluster_key_err_ack_dup_trans_reenqueue=607;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;stat_compressed_pkts_received=0;err_tsvc_requests=110;err_tsvc_requests_timeout=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=17;err_rw_pending_limit=19;err_rw_cant_put_unique=0;geo_region_query_count=0;geo_region_query_cells=0;geo_region_query_points=0;geo_region_query_falsepos=0;fabric_msgs_sent=58002818;fabric_msgs_rcvd=57998870;paxos_principal=BB92B00F00A0142;migrate_msgs_sent=55749290;migrate_msgs_recv=55759692;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=7228;migrate_num_incoming_refused=0;queue=0;transactions=101978550;reaped_fds=6;scans_active=0;basic_scans_succeeded=0;basic_scans_failed=0;aggr_scans_succeeded=0;aggr_scans_failed=0;udf_bg_scans_succeeded=0;udf_bg_scans_failed=0;batch_index_initiate=40457778;batch_index_queue=0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0;batch_index_complete=40456708;batch_index_timeout=1037;batch_index_errors=33;batch_index_unused_buffers=256;batch_index_huge_buffers=217168717;batch_index_created_buffers=217583519;batch_index_destroyed_buffers=217583263;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=7021;proxy_action=5519;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=551089;write_prole=1055431;read_dup_prole=14232;rw_err_dup_internal=0;rw_err_dup_cluster_key=1814;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=1767;rw_err_ack_badnode=0;client_connections=366;waiting_transactions=0;tree_count=0;record_refs=3739898;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=296;partition_replica=572;partition_desync=0;partition_absent=3228;partition_zombie=0;partition_object_count=3739898;partition_ref_count=4096;system_free_mem_pct=61;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=90;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=53184;uptime=1001388;stat_write_errs_notfound=0;stat_write_errs_other=90;heartbeat_received_self=0;heartbeat_received_foreign=145137042;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_running=0;query_long_running=0;query_short_queue_full=0;query_long_queue_full=0;query_short_reqs=0;query_long_reqs=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
3 : features
cdt-list;pipelining;geo;float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 : cluster-generation
61
5 : partition-generation
11811
6 : edition
Aerospike Community Edition
7 : version
Aerospike Community Edition build 3.7.2
8 : build
3.7.2
9 : services
10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.41:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.23:3000
10 : services-alumni
10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.23:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.41:3000
I have a few comments about your configuration. First, transaction-threads-per-queue should be set to 3 or 4 (don't set it to the number of cores).
The second has to do with your batch-read tuning. You're using the (default) batch-index protocol, and the config params you'll need to tune for batch-read performance are:
You have batch-max-requests set very high. This is probably affecting both your CPU load and your memory consumption. It's enough that there's a slight imbalance in the number of keys you're accessing per-node, and that will reflect in the graphs you've shown. At least, this is possibly the issue. It's better that you iterate over smaller batches than try to fetch 200K records per-node at a time.
batch-index-threads – by default its value is 4, and you set it to 32 (of a max of 64). You should do this incrementally by running the same test and benchmarking the performance. On each iteration adjust higher, then down if it's decreased in performance. For example: test with 32, +8 = 40 , +8 = 48, -4 = 44. There's no easy rule-of-thumb for the setting, you'll need to tune through iterations on the hardware you'll be using, and monitor the performance.
batch-max-buffer-per-queue – this is more directly linked to the number of concurrent batch-read operations the node can support. Each batch-read request will consume at least one buffer (more if the data cannot fit in 128K). If you do not have enough of these allocated to support the number of concurrent batch-read requests you will get exceptions with error code 152 BATCH_QUEUES_FULL . Track and log such events clearly, because it means you need to raise this value. Note that this is the number of buffers per-queue. Each batch response worker thread has its own queue, so you'll have batch-index-threads x batch-max-buffer-per-queue buffers, each taking 128K of RAM. The batch-max-unused-buffers caps the memory usage of all these buffers combined, destroying unused buffers until their number is reduced. There's an overhead to allocating and destroying these buffers, so you do not want to set it too low compared to the total. Your current cost is 32 x 256 x 128KB = 1GB.
Finally, you're storing your data on a filesystem. That's fine for development instances, but not recommended for production. In GCE you can provision either a SATA SSD or an NVMe SSD for your data storage, and those should be initialized, and used as block devices. Take a look at the GCE recommendations for more details. I suspect you have warnings in your log about the device not keeping up.
It's likely that one of your nodes is an outlier with regards to the number of partitions it has (and therefore number of objects). You can confirm it with asadm -e 'asinfo -v "objects"'. If that's the case, you can terminate that node, and bring up a new one. This will force the partitions to be redistributed. This does trigger a migration, which takes quite longer in the CE server than in the EE one.
For anyone interested, Aerospike Enterpirse 4.3 introduced 'uniform-balance' which homogeneously balances data partitions. Read more here: https://www.aerospike.com/blog/aerospike-4-3-all-flash-uniform-balance/

Aerospike cluster not clean available blocks

we use aerospike in our projects and caught strange problem.
We have a 3 node cluster and after some node restarting it stop working.
So, we make test to explain our problem
We make test cluster. 3 node, replication count = 2
Here is our namespace config
namespace test{
replication-factor 2
memory-size 100M
high-water-memory-pct 90
high-water-disk-pct 90
stop-writes-pct 95
single-bin true
default-ttl 0
storage-engine device {
cold-start-empty true
file /tmp/test.dat
write-block-size 1M
}
We write 100Mb test data after that we have that situation
available pct equal about 66% and Disk Usage about 34%
All good :slight_smile:
But we stopped one node. After migration we see that available pct = 49% and disk usage 50%
Return node to cluster and after migration we see that disk usage became previous about 32%, but available pct on old nodes stay 49%
Stop node one more time
available pct = 31%
Repeat one more time we get that situation
available pct = 0%
Our cluster crashed, Clients get AerospikeException: Error Code 8: Server memory error
So how we can clean available pct?
If your defrag-q is empty (and you can see whether it is from grepping the logs) then the issue is likely to be that your namespace is smaller than your post-write-queue. Blocks on the post-write-queue are not eligible for defragmentation and so you would see avail-pct trending down with no defragmentation to reclaim the space. By default the post-write-queue is 256 blocks and so in your case that would equate to 256Mb. If your namespace is smaller than that you will see avail-pct continue to drop until you hit stop-writes. You can reduce the size of the post-write-queue dynamically (i.e. no restart needed) using the following command, here I suggest 8 blocks:
asinfo -v 'set-config:context=namespace;id=<NAMESPACE>;post-write-queue=8'
If you are happy with this value you should amend your aerospike.conf to include it so that it persists after a node restart.