I have a working EKS cluster. It is using a ALB for ingress.
When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.
Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?
Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:
deploy.sh
#!/bin/bash
set -e
kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
service.yaml
apiVersion: v1
kind: Service
metadata:
name: "site-bob"
namespace: "next-sites"
spec:
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: NodePort
selector:
app: "site-bob"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: "site-bob"
namespace: "next-sites"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: ipv4
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
alb.ingress.kubernetes.io/group.name: eks-ingress-1
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
alb.ingress.kubernetes.io/success-codes: 200,201
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
alb.ingress.kubernetes.io/actions.ssl-redirect: >
{
"type": "redirect",
"redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
}
alb.ingress.kubernetes.io/actions.svc-host: >
{
"type":"forward",
"forwardConfig":{
"targetGroups":[
{
"serviceName":"site-bob",
"servicePort": 80,"weight":20}
],
"targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
}
}
labels:
app: site-bob
spec:
rules:
- host: "staging-bob.imgeinc.net"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- backend:
service:
name: svc-host
port:
name: use-annotation
pathType: ImplementationSpecific
Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:
kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
I saw many lines like:
{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.
sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.
I removed the tag:
{
Key: 'kubernetes.io/cluster/_cluster name_',
value:'owned'
}
from sg-04b2754f1c85ac8b9
and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.
facing the same issue when creating the cluster with terraform. Solved updating aws load balancer controller from 2.3 to 2.4.4
i'm using Istio 1.5.4 and trying apply the example referenced here:
https://istio.io/latest/docs/tasks/security/authentication/authn-policy/#end-user-authentication
Everything works as expected until defining the AuthorizationPolicy - the moment i introduce that i would get a 502 Bad Gateway error regardless if i provide a valid JWT token or not.
On a secondary note, I'm able to get the AuthorizationPolicy to work properly if i update the example to be applied at my own service namespaced level. Then RequestAuthentication + AuthorizationPolicy would work as expected, however, i would run into a different roadblock where now internal service would also require a valid jwt token.
authentication/authorization internal service issue
I've discovered that the 502 is a result of my loadbalancer health check failing due to the AuthorizationPolicy applied. Adding a conditional header User-Agent against my healh check probe seems to do the trick, but then i get back the net effect where no token provided is still getting through
No token is getting through because that´s how you configured your AuthorizationPolicy, that´s how source: requestPrincipals: ["*"] works. Take a look at this example.
RequestAuthentication defines what request authentication methods are supported by a workload. If will reject a request if the request contains invalid authentication information, based on the configured authentication rules. A request that does not contain any authentication credentials will be accepted but will not have any authenticated identity. To restrict access to authenticated requests only, this should be accompanied by an authorization rule. Examples:
Require JWT for all request for workloads that have label app:httpbin
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: httpbin
namespace: foo
spec:
selector:
matchLabels:
app: httpbin
jwtRules:
- issuer: "issuer-foo"
jwksUri: https://example.com/.well-known/jwks.json
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: httpbin
namespace: foo
spec:
selector:
matchLabels:
app: httpbin
rules:
- from:
- source:
requestPrincipals: ["*"]
Use requestPrincipals: ["testing#secure.istio.io/testing#secure.istio.io"] instead as mentioned here, then it will accept only requests with token.
apiVersion: security.istio.io/v1beta1
kind: RequestAuthentication
metadata:
name: frontend
namespace: default
spec:
selector:
matchLabels:
app: frontend
jwtRules:
- issuer: "testing#secure.istio.io"
jwksUri: "https://raw.githubusercontent.com/istio/istio/release-1.5/security/tools/jwt/samples/jwks.json"
The second resource is an AuthorizationPolicy, which ensures that all requests have a JWT - and rejects requests that do not, returning a 403 error.
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: require-jwt
namespace: default
spec:
selector:
matchLabels:
app: frontend
action: ALLOW
rules:
- from:
- source:
requestPrincipals: ["testing#secure.istio.io/testing#secure.istio.io"]
Once we apply these resources, we can curl the Istio ingress gateway without a JWT, and see that the AuthorizationPolicy is rejecting our request because we did not supply a token:
$ curl ${INGRESS_IP}
RBAC: access denied
Finally, if we curl with a valid JWT, we can successfully reach the frontend via the IngressGateway:
$ curl --header "Authorization: Bearer ${VALID_JWT}" ${INGRESS_IP}
Hello World! /
I'm trying to setup a Let's Encrypt certificate on Google Cloud. I recently changed it from http01 to dns01 challenge type so that I could create Cloud DNS zones and the acme challenge TXT record would automatically be added.
Here's my certificate.yaml
apiVersion: certmanager.k8s.io/v1alpha1
kind: Certificate
metadata:
name: san-tls
namespace: default
spec:
secretName: san-tls
issuerRef:
name: letsencrypt
commonName: www.evolut.net
altNames:
- portal.evolut.net
dnsNames:
- www.evolut.net
- portal.evolut.net
acme:
config:
- dns01:
provider: clouddns
domains:
- www.evolut.net
- portal.evolut.net
However now I get the following error when I kubectl describe certificate:
Message: DNS names on TLS certificate not up to date: ["portal.evolut.net" "www.evolut.net"]
Reason: DoesNotMatch
Status: False
Type: Ready
More worryingly, when I kubectl describe order I see the following:
Status:
Challenges:
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Dns Name: portal.evolut.net
Issuer Ref:
Kind: Issuer
Name: letsencrypt
Key: redacted
Token: redacted
Type: http-01
URL: https://acme-v02.api.letsencrypt.org/acme/challenge/redacted
Wildcard: false
Authz URL: https://acme-v02.api.letsencrypt.org/acme/authz/redacted
Config:
Http 01:
Notice how the Type is always http-01, although in the certificate they are listed under dns01.
This means that the ACME TXT file is never created in Cloud DNS and of course the domains aren't validated.
This seems to be related an issue related to the use of multiple domains. I suggest the use of two different namespaces. You can check an example in the following link:
Failed to list *v1alpha1.Order: orders.certmanager.k8s.io is forbidden
I am running isio 1.0.2 and am unable to configure service authorization based on JWT claims against Azure AD.
I have succesfully configured and validated Azure AD oidc jwt end user authentication and it works fine.
Now I'd like to configure RBAC Authorization using request.auth.claims["preferred_username"] attribute.
I've created a ServiceRoleBinding like below:
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRole
metadata:
name: service-reader
namespace: default
spec:
rules:
- services: ["myservice.default.svc.cluster.local"]
methods: ["GET"]
paths: ["*/products"]
---
apiVersion: "rbac.istio.io/v1alpha1"
kind: ServiceRoleBinding
metadata:
name: service-reader-binding
namespace: default
spec:
subjects:
- properties:
source.principal: "*"
request.auth.claims["preferred_username"]: "user#company.com"
roleRef:
kind: ServiceRole
name: "service-reader"
However, I keep getting 403 Forbidden from the service proxy, even though preferred_username claim from Authentication header is correct.
If I comment out request.auth.claims["preferred_username"]: "user#company.com" line the request succeeds.
Can anyone point me in the right direction regarding configuring authorization based on oidc and jwt?
Never mind. I found the problem.
I was missing user: "*" check to allow all users.
so under subjects it should say:
subjects:
- user: "*"
properties:
source.principal: "*"
request.auth.claims["preferred_username"]: "user#company.com"
That fixes it.
I've a Kubernetes cluster which currently works with haproxy ingress controller (and is working fine). I'm trying traefik as an ingress controller, but it always return 404 even for requests which do not return 404 when using the haproxy ingress controller.
traefik config --
[entryPoints]
[entryPoints.http]
address = "S.S.S.S:80"
[entryPoints.https]
address = "S.S.S.S:443"
The (simplified) Ingress object looks like this:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: apifrontend.example.com
http:
paths:
- backend:
serviceName: apifrontend-web
servicePort: 80
path: /
Command line switch --
traefik_linux-amd64-1.5.4 -c /etc/traefik.conf --kubernetes --kubernetes.watch --kubernetes.endpoint=https://Y.Y.Y.Y:8897 --kubernetes.token='XXXXXXXX' --accesslog --loglevel=DEBUG
INFO[2018-03-20T11:48:49+05:30] Using TOML configuration file /etc/traefik.conf
INFO[2018-03-20T11:48:49+05:30] Traefik version v1.5.3 built on 2018-02-27_02:47:04PM
INFO[2018-03-20T11:48:49+05:30]
Stats collection is disabled.
Help us improve Traefik by turning this feature on :)
More details on: https://docs.traefik.io/basics/#collected-data
DEBU[2018-03-20T11:48:49+05:30] Global configuration loaded {"LifeCycle":{"RequestAcceptGraceTimeout":"0s","GraceTimeOut":"10s"},"Grac
eTimeOut":"0s","Debug":false,"CheckNewVersion":true,"SendAnonymousUsage":false,"AccessLogsFile":"","AccessLog":{"format":"common"},"Tr
aefikLogsFile":"","TraefikLog":null,"LogLevel":"DEBUG","EntryPoints":{"http":{"Network":"","Address":"S.S.S.S:80","TLS":null,"R
edirect":null,"Auth":null,"WhitelistSourceRange":null,"Compress":false,"ProxyProtocol":null,"ForwardedHeaders":{"Insecure":true,"Trust
edIPs":null}},"https":{"Network":"","Address":"S.S.S.S:443","TLS":null,"Redirect":null,"Auth":null,"WhitelistSourceRange":null,
"Compress":false,"ProxyProtocol":null,"ForwardedHeaders":{"Insecure":true,"TrustedIPs":null}}},"Cluster":null,"Constraints":[],"ACME":
null,"DefaultEntryPoints":["http"],"ProvidersThrottleDuration":"2s","MaxIdleConnsPerHost":200,"IdleTimeout":"0s","InsecureSkipVerify":
false,"RootCAs":null,"Retry":null,"HealthCheck":{"Interval":"30s"},"RespondingTimeouts":null,"ForwardingTimeouts":null,"Web":null,"Doc
ker":null,"File":null,"Marathon":null,"Consul":null,"ConsulCatalog":null,"Etcd":null,"Zookeeper":null,"Boltdb":null,"Kubernetes":{"Wat
ch":true,"Filename":"","Constraints":[],"Trace":false,"DebugLogGeneratedTemplate":false,"Endpoint":"https://Y.Y.Y.Y:8897","Token":"XXXXXXXXXXX","CertAuthFilePath":"","DisablePassHostHeaders":false,"EnablePassT
LSCert":false,"Namespaces":null,"LabelSelector":""},"Mesos":null,"Eureka":null,"ECS":null,"Rancher":null,"DynamoDB":null,"ServiceFabri
c":null,"Rest":null,"API":null,"Metrics":null,"Ping":null}
INFO[2018-03-20T11:48:49+05:30] Preparing server http &{Network: Address:S.S.S.S:80 TLS:<nil> Redirect:<nil> Auth:<nil> Whiteli
stSourceRange:[] Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc420671c20} with readTimeout=0s writeTimeout=0s idleTimeout=3m0
s
INFO[2018-03-20T11:48:49+05:30] Preparing server https &{Network: Address:S.S.S.S:443 TLS:<nil> Redirect:<nil> Auth:<nil> White
listSourceRange:[] Compress:false ProxyProtocol:<nil> ForwardedHeaders:0xc420671c40} with readTimeout=0s writeTimeout=0s idleTimeout=3
m0s
INFO[2018-03-20T11:48:49+05:30] Starting server on S.S.S.S:80
INFO[2018-03-20T11:48:49+05:30] Starting server on S.S.S.S:443
INFO[2018-03-20T11:48:49+05:30] Starting provider *kubernetes.Provider {"Watch":true,"Filename":"","Constraints":[],"Trace":false,"Deb
ugLogGeneratedTemplate":false,"Endpoint":"https://Y.Y.Y.Y:8897","Token":"XXXXXXXXXXXXXX","CertAuthFilePath":"","DisablePassHostHeaders":false,"EnablePassTLSCert":false,"Namespaces":null,"LabelSelector":""}
INFO[2018-03-20T11:48:49+05:30] Creating cluster-external Provider client with endpoint https://Y.Y.Y.Y:8897
DEBU[2018-03-20T11:48:49+05:30] Using label selector: ''
DEBU[2018-03-20T11:48:49+05:30] Received Kubernetes event kind *v1.Endpoints
DEBU[2018-03-20T11:48:49+05:30] Configuration received from provider kubernetes: {}
INFO[2018-03-20T11:48:49+05:30] Server configuration reloaded on S.S.S.S:80
INFO[2018-03-20T11:48:49+05:30] Server configuration reloaded on S.S.S.S:443
DEBU[2018-03-20T11:48:49+05:30] Received Kubernetes event kind *v1.Endpoints
DEBU[2018-03-20T11:48:49+05:30] Skipping Kubernetes event kind *v1.Endpoints
DEBU[2018-03-20T11:48:50+05:30] Received Kubernetes event kind *v1.Endpoints
DEBU[2018-03-20T11:48:50+05:30] Skipping Kubernetes event kind *v1.Endpoints
DEBU[2018-03-20T11:48:51+05:30] Received Kubernetes event kind *v1.Secret
DEBU[2018-03-20T11:48:51+05:30] Skipping Kubernetes event kind *v1.Secret
DEBU[2018-03-20T11:48:51+05:30] Received Kubernetes event kind *v1.Secret
DEBU[2018-03-20T11:48:51+05:30] Skipping Kubernetes event kind *v1.Secret
DEBU[2018-03-20T11:48:51+05:30] Received Kubernetes event kind *v1.Secret
DEBU[2018-03-20T11:48:51+05:30] Skipping Kubernetes event kind *v1.Secret
...
...
...
Thanks for any pointers.
Your Ingress object contains an Ingress class annotation specifying Nginx:
kubernetes.io/ingress.class: nginx
This causes Traefik to ignore such objects. What you need to do instead is replace nginx by traefik or remove the annotation entirely. See also the Traefik documentation in this regard.