How to use EKS with suitable volumes and resolve subnet IP insufficient issue on AWS? - amazon-eks

I deployed an application in EKS. The deployment always pending, when I checked the events found these issues.
$ kubectl get events
LAST SEEN TYPE REASON OBJECT MESSAGE
89s Warning FailedScheduling pod/awx-demo-111111111-122222 running PreBind plugin "VolumeBinding": binding volumes: provisioning failed for PVC "awx-demo-projects-claim"
49m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555555
32m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-01322i912fas0123na. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555515
15m Warning FailedDeployModel ingress/awx-demo-ingress Failed deploy model due to InvalidSubnet: Not enough IP space available in subnet-031f9c702bc474e8f. ELB requires at least 8 free IP addresses in each subnet.
status code: 400, request id: 11111111-2222-3333-4444-555555555525
89s Normal WaitForPodScheduled persistentvolumeclaim/awx-demo-projects-claim waiting for pod awx-demo-111111111-122222 to be scheduled
21m Warning ProvisioningFailed persistentvolumeclaim/awx-demo-projects-claim Failed to provision volume with StorageClass "gp2": invalid AccessModes [ReadWriteMany]: only AccessModes [ReadWriteOnce] are supported
It seems there are device issue and subnet issue. I created the EKS cluster and node group with these configurations:
resource "aws_eks_cluster" "this" {
encryption_config {
resources = ["secrets"]
provider {
key_arn = aws_kms_key.this.arn
}
}
enabled_cluster_log_types = ["api", "authenticator", "audit", "scheduler", "controllerManager"]
name = local.cluster_name
version = "1.20"
role_arn = aws_iam_role.eks_cluster.arn
vpc_config {
subnet_ids = [
data.aws_ssm_parameter.private_subnet_0_id.value,
data.aws_ssm_parameter.private_subnet_1_id.value,
]
security_group_ids = [aws_security_group.this.id]
endpoint_public_access = true
}
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_vpc_resource_controller,
aws_iam_role_policy_attachment.eks_service_policy,
]
tags = merge(
local.tags,
)
}
resource "aws_eks_node_group" "this" {
cluster_name = local.cluster_name
node_group_name = local.node_group_name
node_role_arn = aws_iam_role.eks_nodes.arn
instance_types = ["m5.2xlarge"]
subnet_ids = [
data.aws_ssm_parameter.private_subnet_0_id.value,
data.aws_ssm_parameter.private_subnet_1_id.value,
]
scaling_config {
desired_size = 2
max_size = 2
min_size = 2
}
lifecycle {
ignore_changes = [scaling_config[0].desired_size]
}
depends_on = [
aws_iam_role_policy_attachment.eks_worker_node_policy,
aws_iam_role_policy_attachment.eks_cni_policy,
aws_iam_role_policy_attachment.ec2_container_register_readonly,
]
tags = merge(
local.tags,
)
}
I didn't define the volume type for EBS, maybe it's using the default setting. How to fix the issue?
For the VPC has insufficient IP addresses issue, if create a new subnet for EKS to use, is it necessary to delete the EKS cluster or node group?
By the way, the deployment I used was https://raw.githubusercontent.com/ansible/awx-operator/0.13.0/deploy/awx-operator.yaml.
The install was used https://github.com/ansible/awx-operator#basic-install.

#miantian, Continuing our discussion from the comments:
A subnet size cannot just be increased. If you change the subnet size, it will be recreated. But as the EKS is there, the subnet creation will fail. So, I would say - start fresh. Delete everything and then start fresh.
Regd the volume issue, by default EKS only supports ReadWriteOnce access mode. This is because of the technical limitation of AWS where an EBS volume can only be attached to 1 EC2 instance. If you want to use ReadWriteMany access mode, you need to use EFS.
If you want to use EFS, look up NFS/EFS client provisioner for EKS. There are few steps you need to follow in order to create an EFS provisioner in EKS. Then, you can start using ReadWriteMany access mode.

Related

Heroku crashes after Heroku Redis upgrade from Hobby to Premium 0

Problem
When I upgraded from the Heroku Redis Hobby plan to the Heroku Redis Premium 0 plan, Heroku kept crashing with an H10 error.
Cause
Redis 6 requires TLS to connect. However, Heroku manages requests from the router level to the application level involving Self Signed Certs. Turns out, Heroku terminates SSL at the router level and requests are forwarded from there to the application via HTTP while everything is behind Heroku's Firewall and security measures.
Links that helped track down the cause:
https://ogirginc.github.io/en/heroku-redis-ssl-error
How to enable TLS for Redis 6 on Sidekiq?
Solution
Customize the options passed into Redis so that tls.rejectUnauthorized is set to false.
const Queue = require('bull');
const redisUrlParse = require('redis-url-parse');
const REDIS_URL = process.env.REDIS_URL || 'redis://127.0.0.1:6379';
const redisUrlParsed = redisUrlParse(REDIS_URL);
const { host, port, password } = redisUrlParsed;
const bullOptions = REDIS_URL.includes('rediss://')
? {
redis: {
port: Number(port),
host,
password,
tls: {
rejectUnauthorized: false,
},
},
}
: REDIS_URL;
const workQueue = new Queue('work', bullOptions);
Adding (on top of yeoman great answer) -
If you find yourself hitting SSL verification errors on Heroku when using django-rq and the latest Redis add-on -
Know that the RQ_QUEUES definition on django's settings.py supports SSL_CERT_REQS and you may specifically set it to None for solving these issues.
( inspired from https://paltman.com/how-to-turn-off-ssl-verify-django-rq-heroku-redis/ ).
Note that it requires boosting django-rq to a version >= 2.5.1.
That might be relevant for all users who apply Redis only for queuing (e.g. with RedisRQ) and not for caching.

SSL Certification Verify Failed on Heroku Redis

I'm deploying a flask app on Heroku using a Redis premium plan. I get the following error: 'SSL Certification Verify Failed'. Attempted fixes:
Downgrading to Redis 5
Passing ssl_cert_reqs=None to the Redis constructor in redis-py
A solution to this problem could be:
Explain how to disable TLS certification on heroku redis premium plans
Explain how to make TLS certification work on heroku redis premium plans
From Heroku's docs, this may be a hint: 'you must enable TLS in your Redis client’s configuration in order to connect to a Redis 6 database'. I don't understand what this means.
I solved my problem by adding ?ssl_cert_reqs=CERT_NONE to the end of REDIS_URL in my Heroku config.
You can disable TLS certification on Heroku by downgrading to Redis 5 and passing ssl_cert_reqs=None to the Redis constructor.
$ heroku addons:create heroku-redis:premium-0 --version 5
from redis import ConnectionPool, Redis
import os
connection_pool = ConnectionPool.from_url(os.environ.get('REDIS_URL'))
app.redis = Redis(connection_pool=connection_pool, ssl_cert_reqs=None)
My mistake was not doing both at the same time.
An ideal solution would explain how to configure TLS certification for Redis 6.
The docs are actually incorrect, you have to set SSL to verify_none because TLS happens automatically.
From Heroku support:
"Our data infrastructure uses self-signed certificates so certificates
can be cycled regularly... you need to set the verify_mode
configuration variable to OpenSSL::SSL::VERIFY_NONE"
I solved this by setting the ssl_params to verify_none:
ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
For me it was where I config redis (in a sidekiq initializer):
# config/initializers/sidekiq.rb
Sidekiq.configure_client do |config|
config.redis = { url: ENV['REDIS_URL'], size: 1, network_timeout: 5,
ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
Sidekiq.configure_server do |config|
config.redis = { url: ENV['REDIS_URL'], size: 7, network_timeout: 5,
ssl_params: { verify_mode: OpenSSL::SSL::VERIFY_NONE } }
end
On Heroku (assuming Heroku Redis addon), the redis TLS route already has the ssl_cert_reqs param sorted out. A common oversight that can cause errors in cases like this on heroku is: using REDIS_URL over REDIS_TLS_URL.
Solution:
redis_url = os.environ.get('REDIS_TLS_URL')
This solution works with redis 6 and python on Heroku
import os, redis
redis_url = os.getenv('REDIS_URL')
redis_store = redis.from_url(redis_url, ssl_cert_reqs=None)
In my local development environment I do not use redis with the rediss scheme, so I use a function like this to allow work in both cases:
def get_redis_store():
'''
Get a connection pool to redis based on the url configured
on env variable REDIS_URL
Returns
-------
redis.ConnectionPool
'''
redis_url = os.getenv('REDIS_URL')
if redis_url.startswith('rediss://'):
redis_store = redis.from_url(
redis_url, ssl_cert_reqs=None)
else:
redis_store = redis.from_url(redis_url)
return redis_store
If using the django-rq wrapper and trying to deal with this, be sure to not use the URL parameter with SSL_CERTS_REQS. There is an outstanding issue that describes this all, but basically you need to specify each connection param instead of using the URL.

Aerospike - “n_bytes_memory went negative” with in memory-only namespace with ttl

We have a namespace configured to store data in memory only with couple of minutes default ttl. After starting putting some data into it, when expiration kicks in, we're getting these messages in the log (a lot, for ~30% of expired records):
WARNING (namespace): (namespace.c::762) set_id 1 - n_bytes_memory went negative!
I have simple client app with server config that can reproduce this: https://github.com/akkomar/aerospike-test (it's based on docker and is very easy to start)
Any advice what might be the reason?
Edit:
I checked this on versions 3.6.4, 3.7.0.1 and 3.7.4
Configuration file used for testing (from https://github.com/akkomar/aerospike-test/blob/master/etc/aerospike.conf):
service {
user root
group root
paxos-single-replica-limit 1
pidfile /var/run/aerospike/asd.pid
service-threads 4
transaction-queues 4
transaction-threads-per-queue 4
proto-fd-max 1024
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
}
console {
context any info
context namespace detail
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002
mesh-port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test_ns {
replication-factor 2
memory-size 1G
default-ttl 10S
storage-engine memory
}
Edit2:
It seems that it's happening only if I update records via UDF. The simplest one that reproduces this:
local VAL_KEY = "v"
function add_data(rec, val_to_add, ttl_to_set)
if aerospike:exists(rec) then
rec[VAL_KEY] = val_to_add
aerospike:update(rec)
else
rec[VAL_KEY] = val_to_add
aerospike:create(rec)
end
end
When I execute the same operation via Java API - everything seems to work fine (example github repo mentioned earlier is updated with Java API example)
The meaning of the error message is that the space we have accounted for the set in memory went to a negative number which should not be possible.
This has been logged in our internal bug tracking system for resolution in future releases
It turned out it was a bug in Aerospike.
It's fixed in version 3.7.4.1 (detailed explanation in https://discuss.aerospike.com/t/problem-with-expiring-records-in-memory-only-namespace-n-bytes-memory-went-negative/2560/6)

How can I use the Chef JSON to set a redis and sidekiq configuration

I'm using AWS OpsWorks for a Rails application with Redis and Sidekiq and would like to do the following:
Override the maxmemory config for redis
Only run Redis & Sidekiq on a selected EC2 instance
My current JSON config only has the database.yml overrides:
{
"deploy": {
"appname": {
"database": {
"username": "user",
"password": "password",
"database": "db_production",
"host": "db.host.com",
"adapter": "mysql2"
}
}
}
}
Override the maxmemory config for redis
Take a look and see if your Redis cookbook of choice gives you an attribute to set that / provide custom config values. I know the main redisio one lets you set config value, as I do it on my stacks (I set the path to the on disk cache, I believe)
Only run Redis & Sidekiq on a selected EC2 instance
This part is easy: create a Layer for Redis (or Redis/Sidekiq) and add an instance to that layer.
Now, because Redis is on a different instance than your Rails server, you won't necessarily know what the IP address for your Redis server is. Especially since you'll probably want to use the internal EC2 IP address vs the public IP address for the box (using the internal address means you're already inside the default firewall).
Sooo... what you'll probably need to do is to write a custom cookbook for your app, if you haven't already. In your attributes/default.rb write some code like this:
redis_instance_details = nil
redis_stack_name = "REDIS"
redis_instance_name, redis_instance_details = node["opsworks"]["layers"][redis_stack_name]["instances"].first
redis_server_dns = "127.0.0.1"
if redis_instance_details
redis_server_dns = redis_instance_details["private_dns_name"]
end
Then later in the attributes file set your redis config to your redis_hostname (maybe using it to set:
default[:deploy][appname][:environment_variables][:REDIS_URL] = "redis://#{redis_server_dns}:#{redis_port_number}"
Hope this helps!

ServiceStack.Redis.Sentinel Usage

I'm running a licensed version of ServiceStack and trying to get a sentinel cluster setup on Google Cloud Compute.
The cluster is basically GCE's click-to-deploy redis solution - 3 servers. Here is the code i'm using to initialize...
var hosts = Settings.Redis.Host.Split(';');
var sentinel = new ServiceStack.Redis.RedisSentinel(hosts, "master");
redis = sentinel.Setup();
container.Register<IRedisClientsManager>(redis);
container.Register<ICacheClient>(redis.GetCacheClient());
The client works fine - but once i shut down one of the redis instances everything craps the bed. The client complains about not being able to connect to the missing instance. Additionally, even when i bring the instance back up - it is in READ ONLY mode, so everything still fails. There doesn't seem to be a way to recover once you are in this state...
Am i doing something wrong? Is there some reason that RedisSentinal client doesn't figure out who the new master is? I feed it all 3 host IP addresses...
You should only be supplying the host of the Redis Sentinel Server to RedisSentinel as it gets the active list of other master/slave redis servers from the Sentinel host.
Some changes to RedisSentinel were recently added in the latest v4.0.37 that's now available on MyGet which includes extra logging and callbacks of Redis Sentinel events. The new v4.0.37 API looks like:
var sentinel = new RedisSentinel(sentinelHost, masterName);
Starting the RedisSentinel will connect to the Sentinel Host and return a pre-configured RedisClientManager (i.e. redis connection pool) with the active
var redisManager = sentinel.Start();
Which you can then register in the IOC with:
container.Register<IRedisClientsManager>(redisManager);
The RedisSentinel should then listen to master/slave changes from the Sentinel hosts and failover the redisManager accordingly. The existing connections in the pool are then disposed and replaced with a new pool for the newly configured hosts. Any active connections outside of the pool they'll throw connection exceptions if used again, the next time the RedisClient is retrieved from the pool it will be configured with the new hosts.
Callbacks and Logging
Here's an example of how you can use the new callbacks to introspect the RedisServer events:
var sentinel = new RedisSentinel(sentinelHost, masterName)
{
OnFailover = manager =>
{
"Redis Managers were Failed Over to new hosts".Print();
},
OnWorkerError = ex =>
{
"Worker error: {0}".Print(ex);
},
OnSentinelMessageReceived = (channel, msg) =>
{
"Received '{0}' on channel '{1}' from Sentinel".Print(channel, msg);
},
};
Logging of these events can also be enabled by configuring Logging in ServiceStack:
LogManager.LogFactory = new ConsoleLogFactory(debugEnabled:false);
There's also an additional explicit FailoverToSentinelHosts() that can be used to force RedisSentinel to re-lookup and failover to the latest master/slave hosts, e.g:
var sentinelInfo = sentinel.FailoverToSentinelHosts();
The new hosts are available in the returned sentinelInfo:
"Failed over to read/write: {0}, read-only: {1}".Print(
sentinelInfo.RedisMasters, sentinelInfo.RedisSlaves);