I'm having hard time understanding the following from AeroSpike documentation:
http://www.aerospike.com/docs/client/python/usage/kvs/record-structure.html
A record also has two metadata values associated with it - the record generation (the number of times it has been modified) and its ttl. A ttl can be set to the numbers of second remaining till it is considered expired, or to 0 (for 'never expire', and the default). Records whose ttl has expired will be garbage collected by the server. Each time the record is written or touched (using touch()) its ttl will reset.
My initial thought was that if namespace policy default ttl is set to 0 then the records will not expire.
My Aerospike namespace configuration is the following :
namespace brand {
replication-factor 2
memory-size 4G
default-ttl 0 # 30 days, use 0 to never expire/evict.
storage-engine memory
# To use file storage backing, comment out the line above and use the
# following lines instead.
set twitter {
set-disable-eviction true
}
storage-engine device {
file /opt/aerospike/data/bar.dat
filesize 16G
data-in-memory true # Store data in memory in addition to file.
}
}
I made a few inserts into the database, then when I retrieved the data I got the following :
{ name: 'test',
twitter: 'test',
domain: 'test.com',
description: 'Your First Round Fund' } { ttl: 4294967295, gen: 2 }
Somehow a ttl appeared on the record and other records have a ttl also. I don't want my records to be deleted from the database. How can I remove the ttl from the records and how can I prevent this from happening in the future?
"asadm -e 'show stat like ttl' Shows the following:
~~~~brand Namespace Statistics~~~~
NODE : 1
cold-start-evict-ttl: 4294967295
default-ttl : 0
max-ttl : 0
~~~~test Namespace Statistics~~~~~
NODE : 1
cold-start-evict-ttl: 4294967295
default-ttl : 2592000
max-ttl : 0
asinfo -v 'hist-dump:ns=brand;hist=ttl' Shows the following
brand:ttl=100,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
The node.js get operations meta information is displaying { ttl: 4294967295, gen: 2 }.
Alright! So the TTL unsigned 32 bit integer, you are seeing the maximum value of (2^32 -1) if it were a signed it it would be -1. The max value has special meaning in Aerospike and it means that it lives indefinitely. Aerospike has accepted -1 as indefinite for about a year now and 0 from the client means use server default.
The node.js client is based on our c client which was changed to convert ttl 0 values from the server to 0xFFFFffff or -1. This was mentioned in the c client's 3.0.51 release notes.
Thanks for highlighting this issue, looks like there are some stale docs around this area.
Related
is there any restriction for client ttl over default-ttl of namespace. for an example ,can we set ttl while writing record to higher then default-ttl of namespace.
or it can have any value irrespective of default-ttl of namespace.
Post server version 4.9, if your namespace is configured to default settings, (default-ttl 0 & nsup-period 0), server will not accept client writes with >0 ttl.
If you want to create/update records with finite ttl (< = 10 years max), set nsup-period to > 0 seconds - enables nsup. Once nsup is enabled, you can insert records with any ttl from the client.
When updating a record which has remaining ttl - say 1 year - from client to ttl = 10 seconds, i.e. trying to set ttl below remaining life - to force expire it out of the system is a bad idea. If you restart the node, there is a 50% chance that the record will be resurrected. So, it is recommended to not set ttl from client below remaining life of the record in the server.
Special client TTL values:
-1 --> make it live-for-ever (unset ttl),
-2 --> update record without changing its current remaining TTL.
I am using Redis for cache in my application which is configured in spring beans, spring-data-redis 1.7.1, jedis 2.9.0.
I would like to know how to set the race condition ttl in the configuration.
Please comment if you have any suggestions.
If I understand you right, you want the same as that Ruby repo, but in Java.
For that case you may want to put a technical lock key along the one you need.
get yourkey
(nil)
get <yourkey>::lock
// if (nil) then calculate, if t then wait. assuming (nil) here
setex <yourkey>::lock 30 t
OK
// calcultions
set <yourkey> <result>
OK
del <yourkey>::lock
(integer) 1
Here with setex you set a lock key with TTL of 30 sec. You can put another TTL if you want.
There is one problem with the code above - some time will pass before checking a lock and aquiring it. To properly aquire a lock EVAL can be used: eval "local lk=KEYS[1]..'::lock' local lock=redis.call('get',lk) if (lock==false) then redis.call('setex',lk,KEYS[2],'t') return 1 else return 0 end" 2 <yourkey> 30 This either returns 0 if there is no lock or puts a lock and returns 1.
I have an application with a high load for batch read operations. My Aerospike cluster (v 3.7.2) has 14 servers, each one with 7GB RAM and 2 CPUs in Google Cloud.
By looking at Google Cloud Monitoring Graphs, I noticed a very unbalanced load between servers: some servers have almost 100% CPU load, while others have less than 50% (image below). Even after hours of operation, the cluster unbalanced pattern doesn't change.
Is there any configuration that I could change to make this cluster more homogeneous? How to optimize node balancing?
Edit 1
All servers in the cluster have the same identical aerospike.conf file:
Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
paxos-recovery-policy auto-reset-master
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
#address any
port 3000
}
heartbeat {
mode mesh
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 3
memory-size 5G
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 180G
}
}
Edit 2:
$ asinfo
1 : node
BB90600F00A0142
2 : statistics
cluster_size=14;cluster_key=E3C3672DCDD7F51;cluster_integrity=true;objects=3739898;sub-records=0;total-bytes-disk=193273528320;used-bytes-disk=26018492544;free-pct-disk=86;total-bytes-memory=5368709120;used-bytes-memory=239353472;data-used-bytes-memory=0;index-used-bytes-memory=239353472;sindex-used-bytes-memory=0;free-pct-memory=95;stat_read_reqs=2881465329;stat_read_reqs_xdr=0;stat_read_success=2878457632;stat_read_errs_notfound=3007093;stat_read_errs_other=0;stat_write_reqs=551398;stat_write_reqs_xdr=0;stat_write_success=549522;stat_write_errs=90;stat_xdr_pipe_writes=0;stat_xdr_pipe_miss=0;stat_delete_success=4;stat_rw_timeout=1862;udf_read_reqs=0;udf_read_success=0;udf_read_errs_other=0;udf_write_reqs=0;udf_write_success=0;udf_write_err_others=0;udf_delete_reqs=0;udf_delete_success=0;udf_delete_err_others=0;udf_lua_errs=0;udf_scan_rec_reqs=0;udf_query_rec_reqs=0;udf_replica_writes=0;stat_proxy_reqs=7021;stat_proxy_reqs_xdr=0;stat_proxy_success=2121;stat_proxy_errs=4739;stat_ldt_proxy=0;stat_cluster_key_err_ack_dup_trans_reenqueue=607;stat_expired_objects=0;stat_evicted_objects=0;stat_deleted_set_objects=0;stat_evicted_objects_time=0;stat_zero_bin_records=0;stat_nsup_deletes_not_shipped=0;stat_compressed_pkts_received=0;err_tsvc_requests=110;err_tsvc_requests_timeout=0;err_out_of_space=0;err_duplicate_proxy_request=0;err_rw_request_not_found=17;err_rw_pending_limit=19;err_rw_cant_put_unique=0;geo_region_query_count=0;geo_region_query_cells=0;geo_region_query_points=0;geo_region_query_falsepos=0;fabric_msgs_sent=58002818;fabric_msgs_rcvd=57998870;paxos_principal=BB92B00F00A0142;migrate_msgs_sent=55749290;migrate_msgs_recv=55759692;migrate_progress_send=0;migrate_progress_recv=0;migrate_num_incoming_accepted=7228;migrate_num_incoming_refused=0;queue=0;transactions=101978550;reaped_fds=6;scans_active=0;basic_scans_succeeded=0;basic_scans_failed=0;aggr_scans_succeeded=0;aggr_scans_failed=0;udf_bg_scans_succeeded=0;udf_bg_scans_failed=0;batch_index_initiate=40457778;batch_index_queue=0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0,0:0;batch_index_complete=40456708;batch_index_timeout=1037;batch_index_errors=33;batch_index_unused_buffers=256;batch_index_huge_buffers=217168717;batch_index_created_buffers=217583519;batch_index_destroyed_buffers=217583263;batch_initiate=0;batch_queue=0;batch_tree_count=0;batch_timeout=0;batch_errors=0;info_queue=0;delete_queue=0;proxy_in_progress=0;proxy_initiate=7021;proxy_action=5519;proxy_retry=0;proxy_retry_q_full=0;proxy_unproxy=0;proxy_retry_same_dest=0;proxy_retry_new_dest=0;write_master=551089;write_prole=1055431;read_dup_prole=14232;rw_err_dup_internal=0;rw_err_dup_cluster_key=1814;rw_err_dup_send=0;rw_err_write_internal=0;rw_err_write_cluster_key=0;rw_err_write_send=0;rw_err_ack_internal=0;rw_err_ack_nomatch=1767;rw_err_ack_badnode=0;client_connections=366;waiting_transactions=0;tree_count=0;record_refs=3739898;record_locks=0;migrate_tx_objs=0;migrate_rx_objs=0;ongoing_write_reqs=0;err_storage_queue_full=0;partition_actual=296;partition_replica=572;partition_desync=0;partition_absent=3228;partition_zombie=0;partition_object_count=3739898;partition_ref_count=4096;system_free_mem_pct=61;sindex_ucgarbage_found=0;sindex_gc_locktimedout=0;sindex_gc_inactivity_dur=0;sindex_gc_activity_dur=0;sindex_gc_list_creation_time=0;sindex_gc_list_deletion_time=0;sindex_gc_objects_validated=0;sindex_gc_garbage_found=0;sindex_gc_garbage_cleaned=0;system_swapping=false;err_replica_null_node=0;err_replica_non_null_node=0;err_sync_copy_null_master=0;storage_defrag_corrupt_record=0;err_write_fail_prole_unknown=0;err_write_fail_prole_generation=0;err_write_fail_unknown=0;err_write_fail_key_exists=0;err_write_fail_generation=0;err_write_fail_generation_xdr=0;err_write_fail_bin_exists=0;err_write_fail_parameter=0;err_write_fail_incompatible_type=0;err_write_fail_noxdr=0;err_write_fail_prole_delete=0;err_write_fail_not_found=0;err_write_fail_key_mismatch=0;err_write_fail_record_too_big=90;err_write_fail_bin_name=0;err_write_fail_bin_not_found=0;err_write_fail_forbidden=0;stat_duplicate_operation=53184;uptime=1001388;stat_write_errs_notfound=0;stat_write_errs_other=90;heartbeat_received_self=0;heartbeat_received_foreign=145137042;query_reqs=0;query_success=0;query_fail=0;query_abort=0;query_avg_rec_count=0;query_short_running=0;query_long_running=0;query_short_queue_full=0;query_long_queue_full=0;query_short_reqs=0;query_long_reqs=0;query_agg=0;query_agg_success=0;query_agg_err=0;query_agg_abort=0;query_agg_avg_rec_count=0;query_lookups=0;query_lookup_success=0;query_lookup_err=0;query_lookup_abort=0;query_lookup_avg_rec_count=0
3 : features
cdt-list;pipelining;geo;float;batch-index;replicas-all;replicas-master;replicas-prole;udf
4 : cluster-generation
61
5 : partition-generation
11811
6 : edition
Aerospike Community Edition
7 : version
Aerospike Community Edition build 3.7.2
8 : build
3.7.2
9 : services
10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.41:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.23:3000
10 : services-alumni
10.0.3.1:3000;10.240.0.42:3000;10.0.3.1:3000;10.240.0.5:3000;10.0.3.1:3000;10.240.0.13:3000;10.0.3.1:3000;10.240.0.14:3000;10.0.3.1:3000;10.240.0.18:3000;10.0.3.1:3000;10.240.0.23:3000;10.0.3.1:3000;10.240.0.24:3000;10.0.3.1:3000;10.240.0.27:3000;10.0.3.1:3000;10.240.0.30:3000;10.0.3.1:3000;10.240.0.37:3000;10.0.3.1:3000;10.240.0.43:3000;10.0.3.1:3000;10.240.0.33:3000;10.0.3.1:3000;10.240.0.41:3000
I have a few comments about your configuration. First, transaction-threads-per-queue should be set to 3 or 4 (don't set it to the number of cores).
The second has to do with your batch-read tuning. You're using the (default) batch-index protocol, and the config params you'll need to tune for batch-read performance are:
You have batch-max-requests set very high. This is probably affecting both your CPU load and your memory consumption. It's enough that there's a slight imbalance in the number of keys you're accessing per-node, and that will reflect in the graphs you've shown. At least, this is possibly the issue. It's better that you iterate over smaller batches than try to fetch 200K records per-node at a time.
batch-index-threads – by default its value is 4, and you set it to 32 (of a max of 64). You should do this incrementally by running the same test and benchmarking the performance. On each iteration adjust higher, then down if it's decreased in performance. For example: test with 32, +8 = 40 , +8 = 48, -4 = 44. There's no easy rule-of-thumb for the setting, you'll need to tune through iterations on the hardware you'll be using, and monitor the performance.
batch-max-buffer-per-queue – this is more directly linked to the number of concurrent batch-read operations the node can support. Each batch-read request will consume at least one buffer (more if the data cannot fit in 128K). If you do not have enough of these allocated to support the number of concurrent batch-read requests you will get exceptions with error code 152 BATCH_QUEUES_FULL . Track and log such events clearly, because it means you need to raise this value. Note that this is the number of buffers per-queue. Each batch response worker thread has its own queue, so you'll have batch-index-threads x batch-max-buffer-per-queue buffers, each taking 128K of RAM. The batch-max-unused-buffers caps the memory usage of all these buffers combined, destroying unused buffers until their number is reduced. There's an overhead to allocating and destroying these buffers, so you do not want to set it too low compared to the total. Your current cost is 32 x 256 x 128KB = 1GB.
Finally, you're storing your data on a filesystem. That's fine for development instances, but not recommended for production. In GCE you can provision either a SATA SSD or an NVMe SSD for your data storage, and those should be initialized, and used as block devices. Take a look at the GCE recommendations for more details. I suspect you have warnings in your log about the device not keeping up.
It's likely that one of your nodes is an outlier with regards to the number of partitions it has (and therefore number of objects). You can confirm it with asadm -e 'asinfo -v "objects"'. If that's the case, you can terminate that node, and bring up a new one. This will force the partitions to be redistributed. This does trigger a migration, which takes quite longer in the CE server than in the EE one.
For anyone interested, Aerospike Enterpirse 4.3 introduced 'uniform-balance' which homogeneously balances data partitions. Read more here: https://www.aerospike.com/blog/aerospike-4-3-all-flash-uniform-balance/
I get the following error while storing the data to aerospike ( client.put ). I have enough space on the drive.
Aerospike: Failed to store record. Error: (13L, 'AEROSPIKE_ERR_RECORD_TOO_BIG', 'src/main/client/put.c', 106).
Here is my Aerospike server namespace configuration
namespace test {
replication-factor 1
memory-size 1G
default-ttl 30d # 30 days, use 0 to never expire/evict.
storage-engine device {
file /opt/aerospike/data/test.dat
filesize 2G
data-in-memory true # Store data in memory in addition to file.
}
}
By default namespaces have a write-block-size of 1 MiB. This is also the maximum configurable size and will limit the max object size the application is able to write to Aerospike.
If you need to go beyond 1 MiB see Large Data Types as a possible solution.
UPDATE 2019/09/06
Since Aerospike 3.16, the write-block-size limit has been increased from 1 MiB to 8 MiB.
Yes, but unfortunately, Aerospike has deprecated LDT (https://www.aerospike.com/blog/aerospike-ldt/). They now recommend to use Lists or Maps, but as stated in their post:
"the new implementation does not solve the problem of the 1MB Aerospike database row size limit. A future key feature of the product will be an enhanced implementation that transcends the 1MB limit for a number of types"
In other terms, it is still an unsolved problem when storing your data on SSD or HDD. However, you can store larger data on memory namespaces.
I have a strange problem writing data to the aerospike cluster
aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')
OK, 1 record affected.
aql> select * from storebig.Chunks where PK = '5cb138284d431abd6a053a56625ec088bfb88912'
Error: (2) AEROSPIKE_ERR_RECORD_NOT_FOUND
aql> insert into storebig.Chunks (PK,Data) values ('5cb138284d431abd6a053a56625ec088bfb88912', '1234567890')
Error: (1) AEROSPIKE_ERR_SERVER
Same story with the golang client library (of course)
It is very possible cluster is not healfy - some strange messages appears in the server(s) log:
May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:49 GMT: WARNING (drv_ssd): (drv_ssd.c::1236) read: read wrong key: expecting de6f0bc93bfdf560 got 8ad3dd7fce1ac7ec
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384
May 06 2015 12:17:50 GMT: WARNING (drv_ssd): (drv_ssd.c::1230) read: bad block magic offset 29843600384
My question is: what can I do to investigate the situation, debug and recover? Where to look and what to try?
Thank you.
With best regards,
Daniel Podolsky
UPDATE
config template (actual config generated from this template on docker container start)
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
file /storage/logs/aerospike.log {
context any info
}
console {
context any info
}
}
network {
service {
address <%=os.getenv("NODE_EXT_ADDR")%>
port 3000
}
fabric {
address <%=os.getenv("NODE_INT_ADDR")%>
port 3001
}
heartbeat {
mode multicast
address 239.1.99.2
port 9918
interface-address <%=os.getenv("NODE_INT_ADDR")%> interval 150
timeout 10
}
info {
address <%=os.getenv("NODE_INT_ADDR")%>
port 3003
}
}
namespace storebig {
replication-factor 3
memory-size <%=os.getenv("MEM_USE_BIG")%>K
default-ttl 0
high-water-disk-pct 98
high-water-memory-pct 98
stop-writes-pct 95
storage-engine device {
file /storage/data/big.dat
filesize 3T
data-in-memory false
}
}
namespace storefast {
replication-factor 3
memory-size <%=os.getenv("MEM_USE_FAST")%>K
default-ttl 0
high-water-disk-pct 98
high-water-memory-pct 98
stop-writes-pct 95
storage-engine device {
file /storage/data/fast.dat
filesize <%=os.getenv("MEM_USE_FAST")%>K
data-in-memory true
}
}
namespace storetest {
replication-factor 3
memory-size <%=os.getenv("MEM_USE_FAST")%>K
default-ttl 0
high-water-disk-pct 98
high-water-memory-pct 98
stop-writes-pct 95
storage-engine device {
file /storage/data/test.dat
filesize 3T
data-in-memory false
}
}
After reading over your configuration I believe I have found your problem. Individual devices and files in Aerospike can be no larger than 2TiB and yours are configured to 3TiB. Regrettably there currently isn't a check in config parser for this limit and I am unable to find reference in our docs--both of these issues are being taken care of.
You can instead use multiple files to store your data for each namespace (each file limited to 2TB). As discussed elsewhere you will likely see better performance by using multiple files or devices for a given namespace.
reading the Aerospike manual, there is no limitation for device size. only for file size (2TB maximum)
Manual:
Recipe for an SSD Storage Engine
The minimal configuration for an SSD namespace requires setting storage-engine to device and adding a device parameter for each SSD to be used by this namespace. In addition, memory-size may need to be changed from the default of 4GB to a size appropriate for the expected primary index size. For assistance in sizing the primary index, please refer to the Sizing Guide. For performance, we recommend reducing the write-block-size from the default of 1MB to 128 Kb on SSD backed namespaces.
Recipe for a HDD Storage Engine with Data in Memory
The minimal configuration for an HDD with Data-in-Memory namespace involves setting storage-engine to device, setting data-in-memory to true, and finally providing a list of file parameters to indicate where data will be persisted. Also filesize needs to be large enough to support the size of the data on disk (with a maximum allowed value of 2 TiB). Lastly, memory-size may need adjusted from the default of 4GB to a size appropriate to handle the expected primary index size and the expected size of the data in memory. For assistance sizing filesize or memory-size please refer to our Sizing Guide.