Registered Targets Disappear - amazon-eks

I have a working EKS cluster. It is using a ALB for ingress.
When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.
Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?
Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:
deploy.sh
#!/bin/bash
set -e
kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
service.yaml
apiVersion: v1
kind: Service
metadata:
name: "site-bob"
namespace: "next-sites"
spec:
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: NodePort
selector:
app: "site-bob"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: "site-bob"
namespace: "next-sites"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: ipv4
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
alb.ingress.kubernetes.io/group.name: eks-ingress-1
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
alb.ingress.kubernetes.io/success-codes: 200,201
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
alb.ingress.kubernetes.io/actions.ssl-redirect: >
{
"type": "redirect",
"redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
}
alb.ingress.kubernetes.io/actions.svc-host: >
{
"type":"forward",
"forwardConfig":{
"targetGroups":[
{
"serviceName":"site-bob",
"servicePort": 80,"weight":20}
],
"targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
}
}
labels:
app: site-bob
spec:
rules:
- host: "staging-bob.imgeinc.net"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- backend:
service:
name: svc-host
port:
name: use-annotation
pathType: ImplementationSpecific

Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:
kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
I saw many lines like:
{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.
sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.
I removed the tag:
{
Key: 'kubernetes.io/cluster/_cluster name_',
value:'owned'
}
from sg-04b2754f1c85ac8b9
and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.

facing the same issue when creating the cluster with terraform. Solved updating aws load balancer controller from 2.3 to 2.4.4

Related

AWS EKS ingress - Entity too large

I am running Laravel 8 api in the cluster and I have this ingress:
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"alb.ingress.kubernetes.io/scheme":"internet-facing","alb.ingress.kubernetes.io/target-type":"ip","kubernetes.io/ingress.class":"alb"},"labels":{"app":"voterapi"},"name":"rapp-ingress","namespace":"voterapp"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"app-service","servicePort":80},"path":"/*"}]}}]}}
kubernetes.io/ingress.class: alb
nginx.ingress.kubernetes.io/proxy-body-size: 100m
creationTimestamp: "2022-05-26T08:25:50Z"
finalizers:
- ingress.k8s.aws/resources
generation: 1
labels:
app: appapi
name: app-ingress
namespace: app
resourceVersion: "94262558"
uid: ec29661a-f4be-4ae1-a0e0-29c3d8bff0e5
spec:
rules:
- http:
paths:
- backend:
service:
name: app-service
port:
number: 80
path: /*
pathType: ImplementationSpecific
status:
loadBalancer:
ingress:
- hostname: XXX
I am trying to upload the file using the API and I am getting
413 Request Entity Too Large
I dont see this error in my PHP log so looks like it is not even getting there.
Can anyone help me to solve the issue?
Update: Try to update your ingress adding nginx.org/client-max-body-size
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.org/proxy-read-timeout: "40s"
nginx.org/proxy-connect-timeout: "40s"
nginx.org/client-max-body-size: "100m"
In some cases, you might need to increase the max size for all post body data and file uploads.
Try to update the post_max_size and upload_max_file_size values in the php.ini configuration:
post_max_size = 100M
upload_max_filesize = 100M
Reference:
NGINX Ingress Controller to increase the client request body
NGINX Ingress Controller: Advanced Configuration with Annotations
https://laracasts.com/discuss/channels/laravel/increase-file-upload-size

Metallb L2Advertisement/IPAdressPools assignments behave strangely

I'm using metallb 0.13.4 L2, I have below IP advertisements and pools. Nginx grabs the right IP address and metallb speakers announce it properly. So IP addresses are correctly assigned.
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: external-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- external-pool
nodeSelectors:
- matchLabels:
kubernetes.io/os: linux
kubernetes.io/arch: amd64
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: internal-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- internal-pool
nodeSelectors:
- matchLabels:
kubernetes.io/os: linux
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: external-pool
namespace: metallb-system
spec:
addresses:
- x.x.x.204/32
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: internal-pool
namespace: metallb-system
spec:
addresses:
- x.x.x.203/32
Nginx configs
....
controller:
annotations:
metallb.universe.tf/address-pool: external-pool
....
---
....
controller:
annotations:
metallb.universe.tf/address-pool: internal-pool
....
and from nginx controller events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 4m6s (x1173 over 19h) metallb-speaker announcing from node [redacted] with protocol "layer2"
See the (x1173 over 19h) so weird ? And when I look at the Ingresses their IP addresses change constantly but they are assigned to either internal or external nginx classes.
$ kl get ingressclass
NAME CONTROLLER PARAMETERS AGE
nginx k8s.io/ingress-nginx <none> 5d6h
nginx-internal k8s.io/ingress-nginx <none> 5d6h
Although Ingress IPs constantly change between x.x.x.203 and x.x.x.204???, they always responds on the assigned IP address!!! This definitely looks very strange.
Note: I wasn't sure about the help in metallb project, that's why I'm creating the issue here.
The problem was the annotations on the controller, they should be at controller.service; Here is the working configuration;
controller:
service:
externalTrafficPolicy: Local
type: LoadBalancer
loadBalancerIP: x.x.x.203
annotations:
metallb.universe.tf/address-pool: "internal-pool"
Additionally, service must be type LoadBalancer and IP is specified.

NIFI CLUSTER AND ZOOKEEPER CLUSTER

I want to configure a NIFI Cluster with external TLS zookeeper cluster (deployed in a kubernetes cluster). All is ok (quorum, zookeeper tls...) but when I set the zookeeper connection string to … myzk:3181,myzk2:3181… and Nifi tries connect to zookeeper cluster, I get this message :
io.netty.handler.codec.DecoderException: io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0000002d0000
I think that is because Nifi is talking with zookeeper HTTP and the 3181 is HTTPS
Thanks in advance, regards
NIFI Version : 1.12.1
Zookeeper 3.7.0 (QUORUM IS OK)
#nifi.properties
# Site to Site properties
nifi.remote.input.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.remote.input.http.transaction.ttl=30 sec
nifi.remote.contents.cache.expiration=30 secs
# web properties #
nifi.web.war.directory=./lib
nifi.web.proxy.host=my_proxy.com
nifi.web.http.port=
nifi.web.https.port=9443
nifi.web.http.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
# nifi.web.proxy.context.path=
# security properties #
nifi.sensitive.props.key=
nifi.sensitive.props.key.protected=
nifi.sensitive.props.algorithm=PBEWITHMD5AND256BITAES-CBC-OPENSSL
nifi.sensitive.props.provider=BC
nifi.sensitive.props.additional.keys=
nifi.security.keystore=/opt/nifi/nifi-current/config-data/certs/keystore.jks
nifi.security.keystoreType=jks
nifi.security.keystorePasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.keyPasswd=tym6nSAHI7xwnqUdwi4OGn2RpXtq9zLpqurol1lLqVg
nifi.security.truststore=/opt/nifi/nifi-current/config-data/certs/truststore.jks
nifi.security.truststoreType=jks
nifi.security.truststorePasswd=wRbjBPa62GLnlWaGMIMg6Ak6n+AyCeUKEquGSwyJt24
nifi.security.needClientAuth=true
nifi.security.user.authorizer=managed-authorizer
nifi.security.user.login.identity.provider=
nifi.security.ocsp.responder.url=
nifi.security.ocsp.responder.certificate=
# OpenId Connect SSO Properties #
nifi.security.user.oidc.discovery.url=https://my_url_oidc
nifi.security.user.oidc.connect.timeout=5 secs
nifi.security.user.oidc.read.timeout=5 secs
nifi.security.user.oidc.client.id=lkasdnlnsda
nifi.security.user.oidc.client.secret=fdjksalfnslknasfiDHn
nifi.security.user.oidc.preferred.jwsalgorithm=
nifi.security.user.oidc.claim.identifying.user=email
nifi.security.user.oidc.additional.scopes=
# Apache Knox SSO Properties #
nifi.security.user.knox.url=
nifi.security.user.knox.publicKey=
nifi.security.user.knox.cookieName=hadoop-jwt
nifi.security.user.knox.audiences=
# Identity Mapping Properties #
# These properties allow normalizing user identities such that identities coming from different identity providers
# (certificates, LDAP, Kerberos) can be treated the same internally in NiFi. The following example demonstrates normalizing
# DNs from certificates and principals from Kerberos into a common identity string:
#
# nifi.security.identity.mapping.pattern.dn=^CN=(.*?), OU=(.*?), O=(.*?), L=(.*?), ST=(.*?), C=(.*?)$
# nifi.security.identity.mapping.value.dn=$1#$2
# nifi.security.identity.mapping.pattern.kerb=^(.*?)/instance#(.*?)$
# nifi.security.identity.mapping.value.kerb=$1#$2
# cluster common properties (all nodes must have same values) #
nifi.cluster.protocol.heartbeat.interval=5 sec
nifi.cluster.protocol.is.secure=true
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=nifi-0.nifi-headless.nifi-pro.svc.cluster.local
nifi.cluster.node.protocol.port=11443
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 mins
nifi.cluster.flow.election.max.candidates=
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=nifi-zk:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
nifi.zookeeper.client.secure=true
## BY DEFAULT, NIFI CLIENT WILL USE nifi.security.* if you require separate keystore and truststore uncomment below section
nifi.zookeeper.security.keystore=/opt/nifi/nifi-current/config-data/certs/zk/keystore.jks
nifi.zookeeper.security.keystoreType=JKS
nifi.zookeeper.security.keystorePasswd=123456
nifi.zookeeper.security.truststore=/opt/nifi/nifi-current/config-data/certs/zk/truststore.jks
nifi.zookeeper.security.truststoreType=JKS
nifi.zookeeper.security.truststorePasswd=123456
# Zookeeper properties for the authentication scheme used when creating acls on znodes used for cluster management
# Values supported for nifi.zookeeper.auth.type are "default", which will apply world/anyone rights on znodes
# and "sasl" which will give rights to the sasl/kerberos identity used to authenticate the nifi node
# The identity is determined using the value in nifi.kerberos.service.principal and the removeHostFromPrincipal
# and removeRealmFromPrincipal values (which should align with the kerberos.removeHostFromPrincipal and kerberos.removeRealmFromPrincipal
# values configured on the zookeeper server).
nifi.zookeeper.auth.type=
nifi.zookeeper.kerberos.removeHostFromPrincipal=
nifi.zookeeper.kerberos.removeRealmFromPrincipal=
# kerberos #
nifi.kerberos.krb5.file=
# kerberos service principal #
nifi.kerberos.service.principal=
nifi.kerberos.service.keytab.location=
# kerberos spnego principal #
nifi.kerberos.spnego.principal=
nifi.kerberos.spnego.keytab.location=
nifi.kerberos.spnego.authentication.expiration=12 hours
# external properties files for variable registry
# supports a comma delimited list of file locations
nifi.variable.registry.properties=
You usually see this when you have a HTTP vs HTTPS mismatch
ideally you would be calling your service over the HTTP
spring:
cloud:
gateway:
discovery:
locator:
url-expression: "'lb:http://'+serviceId"
For reference using client port 2181
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
ports:
- port: 2181
name: client
selector:
app: zk
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: Always
image: "k8s.gcr.io/kubernetes-zookeeper:1.0-3.4.10"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
securityContext:
runAsUser: 1000
fsGroup: 1000
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
NiFi did not support TLS with ZooKeeper until release 1.13.0. If you're using NiFi 1.12.1 it will not support configuring TLS for ZooKeeper.

EKS GlobalNetworkPolicies default-deny with pod exceptions

Currently I have a GlobalNetworkPolicy 'default-deny' to limit all traffic within my cluster, all ingress/egress is set to deny for all().
I have attempted to allow exceptions for certain labels pods, using 'order'.
When I don't specify 'action' arguments so that it allows all communication, the policy works.
Although as below when I specify arguments within the allow, the pod doesn't allows egress traffic.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-pod-ingress
spec:
order: 50
selector: name == 'egresspod'
types:
- Egress
ingress:
- action: Allow
protocol: TCP
source:
selector: some-pod-label == 'some-pod-label-value'
destination:
ports:
- 80
Is this policy configured correctly?
Types: has to match the spec. You have it set to Egress, whereas you defined ingress rules.
If you want egresspod to accept inbound traffic on port 80, then try to change the type to Ingress. (If you want to achieve the opposite, then change both to Egress.)
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-pod-ingress
spec:
order: 50
selector: name == 'egresspod'
types:
- Ingress #Has to match
ingress: # With this guy.
- action: Allow
protocol: TCP
source:
selector: some-pod-label == 'some-pod-label-value'
destination:
ports:
- 80
For more information, check this page: https://docs.projectcalico.org/v3.7/reference/calicoctl/resources/globalnetworkpolicy

How to create service account for Spinnaker

I want to automate pipeline triggers by using fiat service account. So I follow the Spinnaker doc: https://www.spinnaker.io/setup/security/authorization/service-accounts/ Then i have trouble to run the curl command. Where should I run it? I tried to run in local machine which is installed halyard and fiat pod in Kubernetes. However, I got cannot resolve http://front50.url:8080.
Create Role for spinnaker with role name spinnaker-role you can edit role as per you need
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: spinnaker-role
namespace: default
rules:
- apiGroups: [""]
resources: ["namespaces", "configmaps", "events", "replicationcontrollers", "serviceaccounts", "pods/logs"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods", "services", "secrets"]
verbs: ["*"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list", "get"]
- apiGroups: [“apps”]
resources: [“controllerrevisions”, "statefulsets"]
verbs: [“list”]
- apiGroups: ["extensions", "app"]
resources: ["deployments", "replicasets", "ingresses"]
verbs: ["*"]
Service account for spinnaker
apiVersion: v1
kind: ServiceAccount
metadata:
name: spinnaker-service-account
namespace: default
Main part role binding
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: spinnaker-role-binding
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: spinnaker-role
subjects:
- namespace: default
kind: ServiceAccount
name: spinnaker-service-account
You can edit it as per your need changing statefulset adding deployments
This url is just an example and won't work. You need to access it using the service that exposes front50. If you installed using Halyard, probably the service is exposed as spin-front50:8080
I ran it in halyard and used the URL
(I know its really long time after your question :), I just happened to see this and it's better late than never.)
You have to port-forward into the pod, and curl your localhost with the port created for that pod, during port-forwarding.