EKS Fargate Fluent-Bit multiple Outputs - amazon-eks

I'm running a K8 cluster on Fargate and using FluentBit to send logs to cloudwatch
https://docs.aws.amazon.com/eks/latest/userguide/fargate-logging.html#fargate-logging-troubleshooting
I'm able to send all logs to a cloudwatch group with this configMap
kind: Namespace
apiVersion: v1
metadata:
name: aws-observability
labels:
aws-observability: enabled
---
kind: ConfigMap
apiVersion: v1
metadata:
name: aws-logging
namespace: aws-observability
data:
output.conf: |
[OUTPUT]
Name cloudwatch
Match /var/log/1.log
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group true
Fargate Logging does not support INPUT in fluentBit, only OUTPUT, FILTER, and PARSER
Is it possible to use multiple outputs to send logs from one path to a specific cloudwatch group
something like:
[OUTPUT]
Name cloudwatch
Match /var/log/1.log
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group true
[OUTPUT]
Name cloudwatch
Match /var/log/2.log
region us-east-1
log_group_name fluent-bit-cloudwatch_2
log_stream_prefix from-fluent-bit-
auto_create_group true
Where logs from /var/log/1.log would go to fluent-bit-cloudwatch cloudwatch group
and logs from /var/log/2.log would go to fluent-bit-cloudwatch_2 cloudwatch group

Related

Registered Targets Disappear

I have a working EKS cluster. It is using a ALB for ingress.
When I apply a service and then an ingress most of these work as expected. However some target groups eventually have no registered targets. If I get the service IP address kubectl describe svc my-service-name and manually register the EndPoints in the target group the pods are reachable again but that's not a sustainable process.
Any ideas on what might be happening? Why doesn't EKS find the target groups as pods cycle?
Each service (secrets, deployment, service and ingress consists of a set of .yaml files applied like:
deploy.sh
#!/bin/bash
set -e
kubectl apply -f ./secretsMap.yaml
kubectl apply -f ./configMap.yaml
kubectl apply -f ./deployment.yaml
kubectl apply -f ./service.yaml
kubectl apply -f ./ingress.yaml
service.yaml
apiVersion: v1
kind: Service
metadata:
name: "site-bob"
namespace: "next-sites"
spec:
ports:
- port: 80
targetPort: 3000
protocol: TCP
type: NodePort
selector:
app: "site-bob"
ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: "site-bob"
namespace: "next-sites"
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/tags: Environment=Production,Group=api
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/ip-address-type: ipv4
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/load-balancer-name: eks-ingress-1
alb.ingress.kubernetes.io/group.name: eks-ingress-1
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:402995436123:certificate/9db9dce3-055d-4655-842e-xxxxx
alb.ingress.kubernetes.io/healthcheck-port: traffic-port
alb.ingress.kubernetes.io/healthcheck-path: /
alb.ingress.kubernetes.io/healthcheck-interval-seconds: '30'
alb.ingress.kubernetes.io/healthcheck-timeout-seconds: '16'
alb.ingress.kubernetes.io/success-codes: 200,201
alb.ingress.kubernetes.io/healthy-threshold-count: '2'
alb.ingress.kubernetes.io/unhealthy-threshold-count: '2'
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=60
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
alb.ingress.kubernetes.io/actions.ssl-redirect: >
{
"type": "redirect",
"redirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}
}
alb.ingress.kubernetes.io/actions.svc-host: >
{
"type":"forward",
"forwardConfig":{
"targetGroups":[
{
"serviceName":"site-bob",
"servicePort": 80,"weight":20}
],
"targetGroupStickinessConfig":{"enabled":true,"durationSeconds":200}
}
}
labels:
app: site-bob
spec:
rules:
- host: "staging-bob.imgeinc.net"
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- backend:
service:
name: svc-host
port:
name: use-annotation
pathType: ImplementationSpecific
Something in my configuration added tagged two security groups as being owned by the cluster. When I checked the load balancer controller logs:
kubectl logs -n kube-system aws-load-balancer-controller-677c7998bb-l7mwb
I saw many lines like:
{"level":"error","ts":1641996465.6707578,"logger":"controller-runtime.manager.controller.targetGroupBinding","msg":"Reconciler error","reconciler group":"elbv2.k8s.aws","reconciler kind":"TargetGroupBinding","name":"k8s-nextsite-sitefest-89a6f0ff0a","namespace":"next-sites","error":"expect exactly one securityGroup tagged with kubernetes.io/cluster/imageinc-next-eks-4KN4v6EX for eni eni-0c5555fb9a87e93ad, got: [sg-04b2754f1c85ac8b9 sg-07b026b037dd4d6a4]"}
sg-07b026b037dd4d6a4 has description: EKS created security group applied to ENI that is attached to EKS Control Plane master nodes, as well as any managed workloads.
sg-04b2754f1c85ac8b9 has description: Security group for all nodes in the cluster.
I removed the tag:
{
Key: 'kubernetes.io/cluster/_cluster name_',
value:'owned'
}
from sg-04b2754f1c85ac8b9
and the TargetGroups started to fill in and everything is now working. Both groups were created and tagged by Terraform. I suspect my worker group configuration is off.
facing the same issue when creating the cluster with terraform. Solved updating aws load balancer controller from 2.3 to 2.4.4

Assign roles to EKS cluster in manifest file?

I'm new to Kubernetes, and am playing with eksctl to create an EKS cluster in AWS. Here's my simple manifest file
kind: ClusterConfig
apiVersion: eksctl.io/v1alpha5
metadata:
name: sandbox
region: us-east-1
version: "1.18"
managedNodeGroups:
- name: ng-sandbox
instanceType: r5a.xlarge
privateNetworking: true
desiredCapacity: 2
minSize: 1
maxSize: 4
ssh:
allow: true
publicKeyName: my-ssh-key
fargateProfiles:
- name: fp-default
selectors:
# All workloads in the "default" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: default
# All workloads in the "kube-system" Kubernetes namespace will be
# scheduled onto Fargate:
- namespace: kube-system
- name: fp-sandbox
selectors:
# All workloads in the "sandbox" Kubernetes namespace matching the
# following label selectors will be scheduled onto Fargate:
- namespace: sandbox
labels:
env: sandbox
checks: passed
I created 2 roles, EKSClusterRole for cluster management, and EKSWorkerRole for the worker nodes? Where do I use them in the file? I'm looking at eksctl Config file schema page and it's not clear to me where in manifest file to use them.
As you mentioned, it's in the managedNodeGroups docs
managedNodeGroups:
- ...
iam:
instanceRoleARN: my-role-arn
# or
# instanceRoleName: my-role-name
You should also read about
Creating a cluster with Fargate support using a config file
AWS Fargate

Finding the apiVersion for EKS in a region

I am learning EKS -- and have been provided the following example YAML to create cluster.
Where does the 'apiVersion' in the example YAML derived from? Is it eksctl or eks?
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: my-demo-cluster
region: us-west-2
nodeGroups:
- name: my-demo-workers
instanceType: t3.medium
desiredCapacity: 4
minSize: 1
maxSize: 4
It's a standard approach to version API, and here it's definitely coming from eksctl: https://github.com/weaveworks/eksctl/tree/master/pkg/apis/eksctl.io

DigitalOcean pod has unbound immediate PersistentVolumeClaims

I am trying to run a Redis cluster in Kubernetes in DigitalOcean.
As a poc, I simply tried running an example I found online (https://github.com/sanderploegsma/redis-cluster/blob/master/redis-cluster.yml), which is able to spin up the pods appropriately when running locally using minikube.
However, when running it on Digital Ocean, I always get the following error:
Warning FailedScheduling 3s (x8 over 17s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 4 times)
Given that I am not changing anything, I am not sure why this would not work. Does anyone have any suggestions?
EDIT: some additional info
$ kubectl describe pvc
Name: data-redis-cluster-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 3m19s (x3420 over 14h) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: <none>
EDIT: setting the default storage class partially resolved the problem!
However, the node is now not able to find available volumes to bind:
kubectl describe pvc:
Name: data-redis-cluster-0
Namespace: default
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 12m (x9 over 13m) persistentvolume-controller waiting for first consumer to be created before binding
Normal WaitForFirstConsumer 3m19s (x26 over 9m34s) persistentvolume-controller waiting for first consumer to be created before binding
kubectl describe pod redis-cluster-0
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16m (x25 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't find available persistent volumes to bind.
kubectl describe sc
Name: local-storage
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/no-provisioner
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
kubernetes manager pod logs:
I1028 15:30:56.154131 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Claim data-redis-cluster-0 Pod redis-cluster-0 in StatefulSet redis-cluster success
I1028 15:30:56.166649 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588816", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
I1028 15:30:56.220464 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Pod redis-cluster-0 in StatefulSet redis-cluster successful
I1028 15:30:57.004631 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588825", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
This:
no storage class is set
And an empty output for kubectl describe sc means that there's no storage class.
I recommend installing the CSI-driver for Digital Ocean. That will create a do-block-storage class using the Kubernetes CSI interface.
Another option is to use local storage. Using a local storage class:
$ cat <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF | kubectl apply -f -
Then for either case you may need to set it as a default storage class if you don't specify storageClassName in your PVC:
$ kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
or
$ kubectl patch storageclass do-block-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
It is a statefulSet using PersistentVolumeClaims
You need to configure a default storageClass in your cluster so that the PersistentVolumeClaim can take the storage from there.
In minikube one is already available so it succeeds without error:
C02W84XMHTD5:ucp iahmad$ kubectl get sc --all-namespaces
NAME PROVISIONER AGE
standard (default) k8s.io/minikube-hostpath 7d

Kubernetes PersistentVolume and PersistentVolumeClaim could be causing issues for my pod which crashes while copying logs

I have a PersistentVolume that I specified as the following:
apiVersion: v1
kind: PersistentVolume
metadata:
name: mypv-shared
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 5Gi
hostPath:
path: /data/mypv-shared/
Then I created a PersistentVolumeClaim with the following specifications:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypv-shared-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
But when I create the PVC, running kubectl get pv shows that it is bound to a randomly generated PV
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-38c77920-a223-11e7-89cc-08002719b642 5Gi RWX Delete Bound default/mypv-shared standard 16m
I believe this is causing issues for my pods when running tests because I am not sure if the pod is correctly mounting the specified directory. My pods crash at the end of the test when trying to copy over the test logs at the end of the run.
Is the cause really the persistentVolume/Claim or should I be looking into something else? Thanks!
Creating the PVC dynamically provisioned a PV instead of using the one you created manually with the hostpath. On the PVC simply set .spec.storageClassName to and an empty string ("")
From the documentation:
A PVC with its storageClassName set equal to "" is always interpreted to be requesting a PV with no class, so it can only be bound to PVs with no class (no annotation or one set equal to ""). A PVC with no storageClassName is not quite the same ...
So create something like this (I've also added labels and selectors to make sure that the intended PV is paired up the PVC; you might not need that constraint):
apiVersion: v1
kind: PersistentVolume
metadata:
name: mypv-shared
labels:
name: mypv-shared
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 5Gi
hostPath:
path: /data/mypv-shared/
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypv-shared-claim
spec:
storageClassName: ""
selector:
matchLabels:
name: mypv-shared
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi