Metallb L2Advertisement/IPAdressPools assignments behave strangely - metallb

I'm using metallb 0.13.4 L2, I have below IP advertisements and pools. Nginx grabs the right IP address and metallb speakers announce it properly. So IP addresses are correctly assigned.
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: external-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- external-pool
nodeSelectors:
- matchLabels:
kubernetes.io/os: linux
kubernetes.io/arch: amd64
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: internal-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- internal-pool
nodeSelectors:
- matchLabels:
kubernetes.io/os: linux
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: external-pool
namespace: metallb-system
spec:
addresses:
- x.x.x.204/32
---
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: internal-pool
namespace: metallb-system
spec:
addresses:
- x.x.x.203/32
Nginx configs
....
controller:
annotations:
metallb.universe.tf/address-pool: external-pool
....
---
....
controller:
annotations:
metallb.universe.tf/address-pool: internal-pool
....
and from nginx controller events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal nodeAssigned 4m6s (x1173 over 19h) metallb-speaker announcing from node [redacted] with protocol "layer2"
See the (x1173 over 19h) so weird ? And when I look at the Ingresses their IP addresses change constantly but they are assigned to either internal or external nginx classes.
$ kl get ingressclass
NAME CONTROLLER PARAMETERS AGE
nginx k8s.io/ingress-nginx <none> 5d6h
nginx-internal k8s.io/ingress-nginx <none> 5d6h
Although Ingress IPs constantly change between x.x.x.203 and x.x.x.204???, they always responds on the assigned IP address!!! This definitely looks very strange.
Note: I wasn't sure about the help in metallb project, that's why I'm creating the issue here.

The problem was the annotations on the controller, they should be at controller.service; Here is the working configuration;
controller:
service:
externalTrafficPolicy: Local
type: LoadBalancer
loadBalancerIP: x.x.x.203
annotations:
metallb.universe.tf/address-pool: "internal-pool"
Additionally, service must be type LoadBalancer and IP is specified.

Related

AWS EKS ingress - Entity too large

I am running Laravel 8 api in the cluster and I have this ingress:
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"alb.ingress.kubernetes.io/scheme":"internet-facing","alb.ingress.kubernetes.io/target-type":"ip","kubernetes.io/ingress.class":"alb"},"labels":{"app":"voterapi"},"name":"rapp-ingress","namespace":"voterapp"},"spec":{"rules":[{"http":{"paths":[{"backend":{"serviceName":"app-service","servicePort":80},"path":"/*"}]}}]}}
kubernetes.io/ingress.class: alb
nginx.ingress.kubernetes.io/proxy-body-size: 100m
creationTimestamp: "2022-05-26T08:25:50Z"
finalizers:
- ingress.k8s.aws/resources
generation: 1
labels:
app: appapi
name: app-ingress
namespace: app
resourceVersion: "94262558"
uid: ec29661a-f4be-4ae1-a0e0-29c3d8bff0e5
spec:
rules:
- http:
paths:
- backend:
service:
name: app-service
port:
number: 80
path: /*
pathType: ImplementationSpecific
status:
loadBalancer:
ingress:
- hostname: XXX
I am trying to upload the file using the API and I am getting
413 Request Entity Too Large
I dont see this error in my PHP log so looks like it is not even getting there.
Can anyone help me to solve the issue?
Update: Try to update your ingress adding nginx.org/client-max-body-size
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
nginx.org/proxy-read-timeout: "40s"
nginx.org/proxy-connect-timeout: "40s"
nginx.org/client-max-body-size: "100m"
In some cases, you might need to increase the max size for all post body data and file uploads.
Try to update the post_max_size and upload_max_file_size values in the php.ini configuration:
post_max_size = 100M
upload_max_filesize = 100M
Reference:
NGINX Ingress Controller to increase the client request body
NGINX Ingress Controller: Advanced Configuration with Annotations
https://laracasts.com/discuss/channels/laravel/increase-file-upload-size

Registered Targets Disappear

I have a working EKS cluster. It is using a ALB for ingress.
When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.
Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?
Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:
deploy.sh
#!/bin/bash
set -e
kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
service.yaml
apiVersion: v1
kind: Service
metadata:
name: "site-bob"
namespace: "next-sites"
spec:
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: NodePort
selector:
app: "site-bob"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: "site-bob"
namespace: "next-sites"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: ipv4
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
alb.ingress.kubernetes.io/group.name: eks-ingress-1
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
alb.ingress.kubernetes.io/success-codes: 200,201
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
alb.ingress.kubernetes.io/actions.ssl-redirect: >
{
"type": "redirect",
"redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
}
alb.ingress.kubernetes.io/actions.svc-host: >
{
"type":"forward",
"forwardConfig":{
"targetGroups":[
{
"serviceName":"site-bob",
"servicePort": 80,"weight":20}
],
"targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
}
}
labels:
app: site-bob
spec:
rules:
- host: "staging-bob.imgeinc.net"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- backend:
service:
name: svc-host
port:
name: use-annotation
pathType: ImplementationSpecific
Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:
kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
I saw many lines like:
{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.
sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.
I removed the tag:
{
Key: 'kubernetes.io/cluster/_cluster name_',
value:'owned'
}
from sg-04b2754f1c85ac8b9
and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.
facing the same issue when creating the cluster with terraform. Solved updating aws load balancer controller from 2.3 to 2.4.4

EKS GlobalNetworkPolicies default-deny with pod exceptions

Currently I have a GlobalNetworkPolicy 'default-deny' to limit all traffic within my cluster, all ingress/egress is set to deny for all().
I have attempted to allow exceptions for certain labels pods, using 'order'.
When I don't specify 'action' arguments so that it allows all communication, the policy works.
Although as below when I specify arguments within the allow, the pod doesn't allows egress traffic.
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-pod-ingress
spec:
order: 50
selector: name == 'egresspod'
types:
- Egress
ingress:
- action: Allow
protocol: TCP
source:
selector: some-pod-label == 'some-pod-label-value'
destination:
ports:
- 80
Is this policy configured correctly?
Types: has to match the spec. You have it set to Egress, whereas you defined ingress rules.
If you want egresspod to accept inbound traffic on port 80, then try to change the type to Ingress. (If you want to achieve the opposite, then change both to Egress.)
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: allow-pod-ingress
spec:
order: 50
selector: name == 'egresspod'
types:
- Ingress #Has to match
ingress: # With this guy.
- action: Allow
protocol: TCP
source:
selector: some-pod-label == 'some-pod-label-value'
destination:
ports:
- 80
For more information, check this page: https://docs.projectcalico.org/v3.7/reference/calicoctl/resources/globalnetworkpolicy

Cert-manager certificates not found and challenges not created

I followed https://docs.cert-manager.io/en/venafi/tutorials/quick-start/index.html from start to end and everything seems to be working except that I'm not getting an external ip for my ingress.
NAME HOSTS ADDRESS PORTS AGE
staging-site-ingress staging.site.io,staging.admin.site.io, 80, 443 1h
Altough I'm able to use the nginx ingress controller external ip and use dns to access the sites. When I'm going to the urls I'm being redirected to https, so I assume that's working fine.
It redirects to https but still says "not secured", so he don't get a certificate issued.
When I'm debugging I get the following information:
Ingress:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal CreateCertificate 54m cert-manager Successfully created Certificate "tls-secret-staging"
Normal UPDATE 35m (x3 over 1h) nginx-ingress-controller Ingress staging/staging-site-ingress
Normal CreateCertificate 23m (x2 over 35m) cert-manager Successfully created Certificate "letsencrypt-staging-tls"
Certificate:
Status:
Conditions:
Last Transition Time: 2019-02-27T14:02:29Z
Message: Certificate does not exist
Reason: NotFound
Status: False
Type: Ready
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal OrderCreated 3m (x2 over 14m) cert-manager Created Order resource "letsencrypt-staging-tls-593754378"
Secret:
Name: letsencrypt-staging-tls
Namespace: staging
Labels: certmanager.k8s.io/certificate-name=staging-site-io
Annotations: <none>
Type: kubernetes.io/tls
Data
====
ca.crt: 0 bytes
tls.crt: 0 bytes
tls.key: 1679 bytes
Order:
Status:
Certificate: <nil>
Finalize URL:
Reason:
State:
URL:
Events: <none>
So it seems something goes wrong in order and no challenges are created.
Here are my ingress.yaml and issuer.yaml:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: staging-site-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
certmanager.k8s.io/issuer: "letsencrypt-staging"
certmanager.k8s.io/acme-challenge-type: http01
spec:
tls:
- hosts:
- staging.site.io
- staging.admin.site.io
- staging.api.site.io
secretName: letsencrypt-staging-tls
rules:
- host: staging.site.io
http:
paths:
- backend:
serviceName: frontend-service
servicePort: 80
path: /
- host: staging.admin.site.io
http:
paths:
- backend:
serviceName: frontend-service
servicePort: 80
path: /
- host: staging.api.site.io
http:
paths:
- backend:
serviceName: gateway-service
servicePort: 9000
path: /
apiVersion: certmanager.k8s.io/v1alpha1
kind: Issuer
metadata:
name: letsencrypt-staging
namespace: staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: hello#site.io
privateKeySecretRef:
name: letsencrypt-staging-tls
http01: {}
Anyone knows what I can do to fix this or what went wrong? Certmanager is installed correctly 100%, I'm just not sure about the ingress and what went wrong in the order.
Thanks in advance!
EDIT: I found this in the nginx-ingress-controller:
W0227 14:51:02.740081 8 controller.go:1078] Error getting SSL certificate "staging/letsencrypt-staging-tls": local SSL certificate staging/letsencrypt-staging-tls was not found. Using default certificate
It's getting spammed & the CPU load is always at 0.003 and the cpu graph is full (the other services are almost nothing)
I stumbled over the same issue once, following exactly the same official tutorial.
As #mikebridge mentioned, the issue is with Issuer/Secret's namespace mismatch.
For me, the best was to switch from Issuer to ClusterIssuer, which is not scoped to a single namespace.
The reason your certificate order is not completing is because the challenge is failing to successfully complete. Review your solver configuration in either your Issuer or ClusterIssuer.
See my answer here for more details.
https://stackoverflow.com/a/75454772/4820940

Kubernetes PersistentVolume and PersistentVolumeClaim could be causing issues for my pod which crashes while copying logs

I have a PersistentVolume that I specified as the following:
apiVersion: v1
kind: PersistentVolume
metadata:
name: mypv-shared
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 5Gi
hostPath:
path: /data/mypv-shared/
Then I created a PersistentVolumeClaim with the following specifications:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypv-shared-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
But when I create the PVC, running kubectl get pv shows that it is bound to a randomly generated PV
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-38c77920-a223-11e7-89cc-08002719b642 5Gi RWX Delete Bound default/mypv-shared standard 16m
I believe this is causing issues for my pods when running tests because I am not sure if the pod is correctly mounting the specified directory. My pods crash at the end of the test when trying to copy over the test logs at the end of the run.
Is the cause really the persistentVolume/Claim or should I be looking into something else? Thanks!
Creating the PVC dynamically provisioned a PV instead of using the one you created manually with the hostpath. On the PVC simply set .spec.storageClassName to and an empty string ("")
From the documentation:
A PVC with its storageClassName set equal to "" is always interpreted to be requesting a PV with no class, so it can only be bound to PVs with no class (no annotation or one set equal to ""). A PVC with no storageClassName is not quite the same ...
So create something like this (I've also added labels and selectors to make sure that the intended PV is paired up the PVC; you might not need that constraint):
apiVersion: v1
kind: PersistentVolume
metadata:
name: mypv-shared
labels:
name: mypv-shared
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 5Gi
hostPath:
path: /data/mypv-shared/
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypv-shared-claim
spec:
storageClassName: ""
selector:
matchLabels:
name: mypv-shared
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi