Aerospike - “n_bytes_memory went negative” with in memory-only namespace with ttl - aerospike

We have a namespace configured to store data in memory only with couple of minutes default ttl. After starting putting some data into it, when expiration kicks in, we're getting these messages in the log (a lot, for ~30% of expired records):
WARNING (namespace): (namespace.c::762) set_id 1 - n_bytes_memory went negative!
I have simple client app with server config that can reproduce this: https://github.com/akkomar/aerospike-test (it's based on docker and is very easy to start)
Any advice what might be the reason?
Edit:
I checked this on versions 3.6.4, 3.7.0.1 and 3.7.4
Configuration file used for testing (from https://github.com/akkomar/aerospike-test/blob/master/etc/aerospike.conf):
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 1024
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
console {
context any info
context namespace detail
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002
mesh-port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test_ns {
replication-factor 2
memory-size 1G
default-ttl 10S
storage-engine memory
}
Edit2:
It seems that it's happening only if I update records via UDF. The simplest one that reproduces this:
local VAL_KEY = "v"
function add_data(rec, val_to_add, ttl_to_set)
if aerospike:exists(rec) then
rec[VAL_KEY] = val_to_add
aerospike:update(rec)
else
rec[VAL_KEY] = val_to_add
aerospike:create(rec)
end
end
When I execute the same operation via Java API - everything seems to work fine (example github repo mentioned earlier is updated with Java API example)

The meaning of the error message is that the space we have accounted for the set in memory went to a negative number which should not be possible.
This has been logged in our internal bug tracking system for resolution in future releases

It turned out it was a bug in Aerospike.
It's fixed in version 3.7.4.1 (detailed explanation in https://discuss.aerospike.com/t/problem-with-expiring-records-in-memory-only-namespace-n-bytes-memory-went-negative/2560/6)

Related

Syslog-ng logs not processing certain logs possibly due to journal cursor issue

I'm using syslog-ng 3.37.1 on a VMware Photon 3.0 virtual appliance (preconfigured VM). The appliance is configured to write logs into certain files under /var/log folder as well as to remote syslog servers (optional).
Logs from facility 'auth' and 'authpriv' are configured to write to /var/log/auth.log, as well as send it over to remote syslog server when enabled.
In addition, there are other messages as well from kernel, systemd services as well as other processes, configured to be processed via syslog-ng.
Issue is that, logs from a few facilities (such as auth, authpriv, cron etc) are not processed (received?) by syslog-ng initially. So, any SSH events, TTY login events are not logged into the file and remote. However, many other events from kernel, systemd and other processes are logged fine.
Below is the configuration for auth.log, that does not log in the first boot.
filter f_auth { facility(auth) or facility(authpriv)); };
destination authlog { file("/var/log/auth.log" perm(0600)); };
log { source(s_local); filter(f_auth); destination(authlog); };
I updated the filter as below without any success
filter f_auth {
facility(auth) or facility(authpriv) or
match('sshd' value('PROGRAM')) or match('systemd-logind' value('PROGRAM'));
};
In journal logs I can observe the relevant logs, for example, below command to view SSH logs.
journalctl -f -u sshd
Additional syslog-ng service restart or config reload during appliance startup do not fix this.
The log file /var/log/auth.log (and also cron log etc) show zero size during this time. Syslog-ng log looks fine too.
However, if I generate some auth facility event (say, SSH/TTY login) and manually restart syslog-ng, all the log entries (including old events) are immediately written into filesystem log (/var/log/auth.log) and also sent to remote syslog server.
In the syslog-ng.log I find below entry when it starts working that way.
syslog-ng[481]: [date] Failed to seek journal to the saved cursor position; cursor='', error='Invalid argument (22)'
It makes me wonder if it is due to some bad cursor position. However, I can still see other systemd and kernel logs being logged fine. So, not sure.
What could be causing such behaviour? How can I ensure that syslog-ng is able to receive and process these logs without manual intervention?
Below is more detailed configuration for reference:
#version: 3.37
#include "scl.conf"
source s_local {
system();
internal();
udp();
};
destination d_local {
file("/var/log/messages");
file("/var/log/messages-kv.log" template("$ISODATE $HOST $(format-welf --scope all-nv-pairs)\n") frac-digits(3));
};
log {
source(s_local);
# uncomment this line to open port 514 to receive messages
#source(s_network);
destination(d_local);
};
filter f_auth {
facility(auth) or facility(authpriv)); # Also tried facility (auth, authpriv)
};
destination authlog { file("/var/log/auth.log" perm(0600)); };
log { source(s_local); filter(f_auth); destination(authlog); };
destination d_kern { file("/dev/console" perm(0600)); };
filter f_kern { facility(kern); };
log { source(s_local); filter(f_kern); destination(d_kern); };
destination d_cron { file("/var/log/cron" perm(0600)); };
filter f_cron { facility(cron); };
log { source(s_local); filter(f_cron); destination(d_cron); };
destination d_syslogng { file("/var/log/syslog-ng.log" perm(0600)); };
filter f_syslogng { program(syslog-ng); };
log { source(s_local); filter(f_syslogng); destination(d_syslogng); };
# A few more of above kind of configuration follows here.
# Add configuration files that have remote destination, filter and log configuration for remote servers
#include "remote/*.conf"
As can be seen, /var/log/auth.log should hold logs from auth facility, but the log remains blank until subsequent restart of syslog-ng after a syslog config change (via API) or manual login into the system. However, triggering automated restart of syslog-ng using cron (without additional syslog config change) does not help.
Any thoughts, suggestions?
This is probably caused by your real time clock going backwards. The notification mechanism in libsystemd does not work in this case.
There's a proof-of-concept patch in this syslog-ng issue:
https://github.com/syslog-ng/syslog-ng/issues/2836
But I've increased the priority to tackle that problem and fix this, as it is causing issue more often than I anticipated.
As a workaround you should synchronize the time for your VM, preferably so that during boot it waits until a sync and then keep the time synchronized by ntp.

Heroku crashes after Heroku Redis upgrade from Hobby to Premium 0

Problem
When I upgraded from the Heroku Redis Hobby plan to the Heroku Redis Premium 0 plan, Heroku kept crashing with an H10 error.
Cause
Redis 6 requires TLS to connect. However, Heroku manages requests from the router level to the application level involving Self Signed Certs. Turns out, Heroku terminates SSL at the router level and requests are forwarded from there to the application via HTTP while everything is behind Heroku's Firewall and security measures.
Links that helped track down the cause:
https://ogirginc.github.io/en/heroku-redis-ssl-error
How to enable TLS for Redis 6 on Sidekiq?
Solution
Customize the options passed into Redis so that tls.rejectUnauthorized is set to false.
const Queue = require('bull');
const redisUrlParse = require('redis-url-parse');
const REDIS_URL = process.env.REDIS_URL || 'redis://127.0.0.1:6379';
const redisUrlParsed = redisUrlParse(REDIS_URL);
const { host, port, password } = redisUrlParsed;
const bullOptions = REDIS_URL.includes('rediss://')
? {
redis: {
port: Number(port),
host,
password,
tls: {
rejectUnauthorized: false,
},
},
}
: REDIS_URL;
const workQueue = new Queue('work', bullOptions);
Adding (on top of yeoman great answer) -
If you find yourself hitting SSL verification errors on Heroku when using django-rq and the latest Redis add-on -
Know that the RQ_QUEUES definition on django's settings.py supports SSL_CERT_REQS and you may specifically set it to None for solving these issues.
( inspired from https://paltman.com/how-to-turn-off-ssl-verify-django-rq-heroku-redis/ ).
Note that it requires boosting django-rq to a version >= 2.5.1.
That might be relevant for all users who apply Redis only for queuing (e.g. with RedisRQ) and not for caching.

How to use EKS with suitable volumes and resolve subnet IP insufficient issue on AWS?

I deployed an application in EKS. The deployment always pending, when I checked the events found these issues.
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
89s Warning FailedScheduling pod/awx-demo-111111111-122222 running PreBind plugin "VolumeBinding": binding volumes: provisioning failed for PVC "awx-demo-projects-claim"
49m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555555
32m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-01322i912fas0123na. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555515
15m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555525
89s Normal WaitForPodScheduled persistentvolumeclaim/awx-demo-projects-claim waiting for pod awx-demo-111111111-122222 to be scheduled
21m Warning ProvisioningFailed persistentvolumeclaim/awx-demo-projects-claim Failed to provision volume with StorageClass "gp2": invalid AccessModes [ReadWriteMany]: only AccessModes [ReadWriteOnce] are supported
It seems there are device issue and subnet issue. I created the EKS cluster and node group with these configurations:
resource "aws_eks_cluster" "this" {
encryption_config {
resources = ["secrets"]
provider {
key_arn = aws_kms_key.this.arn
}
}
enabled_cluster_log_types = ["api", "authenticator", "audit", "scheduler", "controllerManager"]
name = local.cluster_name
version = "1.20"
role_arn = aws_iam_role.eks_cluster.arn
vpc_config {
subnet_ids = [
data.aws_ssm_parameter.private_subnet_0_id.value,
data.aws_ssm_parameter.private_subnet_1_id.value,
]
security_group_ids = [aws_security_group.this.id]
endpoint_public_access = true
}
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller,
aws_iam_role_policy_attachment.eks_service_policy,
]
tags = merge(
local.tags,
)
}
resource "aws_eks_node_group" "this" {
cluster_name = local.cluster_name
node_group_name = local.node_group_name
node_role_arn = aws_iam_role.eks_nodes.arn
instance_types = ["m5.2xlarge"]
subnet_ids = [
data.aws_ssm_parameter.private_subnet_0_id.value,
data.aws_ssm_parameter.private_subnet_1_id.value,
]
scaling_config {
desired_size = 2
max_size = 2
min_size = 2
}
lifecycle {
ignore_changes = [scaling_config[0].desired_size]
}
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.ec2_container_register_readonly,
]
tags = merge(
local.tags,
)
}
I didn't define the volume type for EBS, maybe it's using the default setting. How to fix the issue?
For the VPC has insufficient IP addresses issue, if create a new subnet for EKS to use, is it necessary to delete the EKS cluster or node group?
By the way, the deployment I used was https://raw.githubusercontent.com/ansible/awx-operator/0.13.0/deploy/awx-operator.yaml.
The install was used https://github.com/ansible/awx-operator#basic-install.
#miantian, Continuing our discussion from the comments:
A subnet size cannot just be increased. If you change the subnet size, it will be recreated. But as the EKS is there, the subnet creation will fail. So, I would say - start fresh. Delete everything and then start fresh.
Regd the volume issue, by default EKS only supports ReadWriteOnce access mode. This is because of the technical limitation of AWS where an EBS volume can only be attached to 1 EC2 instance. If you want to use ReadWriteMany access mode, you need to use EFS.
If you want to use EFS, look up NFS/EFS client provisioner for EKS. There are few steps you need to follow in order to create an EFS provisioner in EKS. Then, you can start using ReadWriteMany access mode.

Aerospike heartbeat configuration for single server, error "Unable to find any suitable network device for node ID"

I want to run Aerospike server in single-server mode.
Now I have this configuration:
service {
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 15000
}
logging {
console {
context any info
}
}
network {
service {
address 127.0.0.1
port 3000
}
heartbeat {
mode multicast
multicast-group 239.1.99.222
port 9918
# To use unicast-mesh heartbeats, remove the 3 lines above, and see
# aerospike_mesh.conf for alternative.
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 1
memory-size 20M
default-ttl 1d # 30 days, use 0 to never expire/evict.
storage-engine memory
}
And when I try to start server I got error in the log:
"Unable to find any suitable network device for node ID"
I don't want server to be available to internet.
How to achieve this and fix the issue?
The Node ID is generated using the MAC id of the interface on the host.
https://github.com/aerospike/aerospike-server/blob/master/cf/src/socket.c#L2470
If you dont have any of the default interface names that aerospike is aware of, then you might get this error.
To fix this problem, you can specify your interface name.
http://www.aerospike.com/docs/operations/troubleshoot/startup#problem-with-network-interface
To avoid exposing your aerospike node on internet, you can bind it only to localhost or to a private interface only or use other network tools/devices to avoid exposing the server port such as firewall or ACL. Best way to avoid exposing aerospike on internet is to ensure that the server hosting aerospike is not exposed to internet. If that is not doable then restrict your aerospike port access to your aerospike clients IP only using firewall. Also, you can use database credentials available in enterprise edition.
http://www.aerospike.com/docs/guide/security.html

Tuning JVM cloudbees

Well, i have a app running on cloudbees, my app need some extra java memory, this app use (hibernate and spring).
Reading in other post and the cloudbees document, i think the way to change a max and min of memory on JVM is by this way: "bees app:deploy -a account/appId -R JAVA_OPTS="-Xms512m -Xmx512m /target/app.ear" but when i do this and try to run the app, throw the next exception
Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified
What i´m doing wrong and what can i do to resolve this problem?
On adding, i'm using Jboss and when run "bees app:info" the info is the next:
Application : account/appId Title : account/appId Created : Mon Aug 04 11:49:18 EDT 2014 Status : active URL : ... clusterSize : 1 container : java_small containerType : jboss71 hibernateTimeout: 7200 jvmPermSize : 256 maxMemory : 256 proxyBuffering : false securityMode : PUBLIC
Thanks
Well finally i found my error.
I solved this problem paying a account in cloudbees. A free account doesn't allow increment a JVM memory above 256mb. When i try increment this parameter Xms512m, the minimum memory exceed maximum memory