Is there a way to monitor tls certificates in kubernetes using prometheus? - ssl

I want to monitor my tls certificates in Kubernetes using Prometheus and get a dashboard in grafana. I want to monitor their expiry and would want to get an alert when the certificates are going to be expired in 30 days. I did a lot of research and I finally found https://github.com/enix/x509-exporter. How do I use it? Is there any other efficient way to monitor the expiry of the certificates?

DISCLAIMER: I haven't tried this x509-exporter. Just giving suggestion as per my understanding.
The README file seems bit off. The first thing you need to do is create a github issue, no worries I raised one here.
I am listing down steps as per my understanding and referring the usage section.
Use their official docker image and deploy it as a deployment on k8s.
Check sample k8s yaml files for creating deployment. Also note that the deployment yaml should mount a host directory where all the k8s certificates are stored.
As per documentation, usually the certificates are located at /etc/kubernetes/pki.
The deployment yaml should contain a command where you point the exporter to the directory where certificates are located with other necessary options. Like this
command: ["x509-exporter"]
args: ["-d", "/etc/kubernetes/pki", "-p", "8091", "--debug"]
Note: Here I am running exporter in debug mode on port 8091, remember to expose this port.
In prometheus config, add the x509-exporter endpoint as target to scrape the metrics and plot those by creating graphs in Grafana dashboard.

Another way is to install the x509-exporter using the helm chart : https://hub.helm.sh/charts/enix/x509-exporter
See documentation here https://github.com/enix/helm-charts/tree/master/charts/x509-exporter.
You might also find the following prometheus alert rules useful (based on the x509-exporter metrics):
check-kubernetes-certificate.rules.yml :
groups:
- name: check-kubernetes-certificate-expiration.rules
rules:
- alert: KubernetesCertificateExpiration
expr: floor((x509_cert_not_after - time()) / 86400) < 90
for: 5m
labels:
severity: warning
annotations:
summary: 'Certificate expiration on `{{ $labels.nb_cluster }}`'
description: 'Certificate `{{ $labels.subject_CN }}` will expire in {{ $value }} days on `{{ $labels.nb_cluster }}`'
- alert: KubernetesCertificateExpirationCritical
expr: floor((x509_cert_not_after - time()) / 86400) < 10
for: 5m
labels:
severity: critical
annotations:
summary: 'Certificate expiration on `{{ $labels.nb_cluster }}`'
description: 'Certificate `{{ $labels.subject_CN }}` will expire in {{ $value }} days on `{{ $labels.nb_cluster }}`'
- alert: KubeletCertificateEmbedded
expr: x509_cert_not_after{filename="kubelet.conf", embedded_kind="user"}
for: 5m
labels:
severity: warning
annotations:
summary: '{{ $labels.instance }}: Embedded certificate in {{ $labels.filename }}'
description: '{{ $labels.nb_cluster }} has kubelet {{ $labels.subject_CN }} running with an embedded certificate in {{ $labels.filepath }}'

The official prometheus/blackbox_exporter have the ssl cert expiry info already.
Name: "probe_ssl_earliest_cert_expiry",
Help: "Returns earliest SSL cert expiry date",
So all you need is:
Setup blackbox_exporter and the Probe rules to the domain you want to monitor.
You can check my project kehao95/helm-prometheus-exporter to install blackbox_exporter via helm chart.
config rule to monitor certificate expiring.
You can config your prometheusRule like this: (assuming you're using prometheus-operator)
rules:
- alert: TLS certificate expiring
expr: (probe_ssl_earliest_cert_expiry - time())/86400 < 45
labels:
severity: warning
- alert: TLS certificate expiring
expr: (probe_ssl_earliest_cert_expiry - time())/86400 < 30
labels:
severity: critical

Related

Forward Flex Gateway Logs to Splunk

I have an instance of MuleSoft's Flex Gateway (v 1.2.0) installed on a Linux machine in a podman container. I am trying to forward container as well as API logs to Splunk. Below is my log.yaml file in /home/username/app folder. Not sure what I am doing wrong, but the logs are not getting forwarded to Splunk.
apiVersion: gateway.mulesoft.com/v1alpha1
kind: Configuration
metadata:
name: logging-config
spec:
logging:
outputs:
- name: default
type: splunk
parameters:
host: <instance-name>.splunkcloud.com
port: "443"
splunk_token: xxxxx-xxxxx-xxxx-xxxx
tls: "on"
tls.verify: "off"
splunk_send_raw: "on"
runtimeLogs:
logLevel: info
outputs:
- default
accessLogs:
outputs:
- default
Please advise.
The endpoint for Splunk's HTTP Event Collector (HEC) is https://http-input.<instance-name>.splunkcloud.com:443/services/collector/raw. If you're using a free trial of Splunk Cloud then change the port number to 8088. See https://docs.splunk.com/Documentation/Splunk/latest/Data/UsetheHTTPEventCollector#Send_data_to_HTTP_Event_Collector_on_Splunk_Cloud_Platform for details.
I managed to get this work. The issue was that I had to give full permissions to the app folder using "chmod" command. After it was done, the fluent-bit.conf file had an entry for Splunk and logs started flowing.

RKE2 Authorized endpoint configuration help required

I have a rancher 2.6.67 server and RKE2 downstream cluster. The cluster was created without authorized cluster endpoint. How to add an authorised cluster endpoint to a RKE2 cluster created by Rancher article describes how to add it in an existing cluster, however although the answer looks promising, I still must miss some detail, because it does not work for me.
Here is what I did:
Created /var/lib/rancher/rke2/kube-api-authn-webhook.yaml file with contents:
apiVersion: v1
kind: Config
clusters:
- name: Default
cluster:
insecure-skip-tls-verify: true
server: http://127.0.0.1:6440/v1/authenticate
users:
- name: Default
user:
insecure-skip-tls-verify: true
current-context: webhook
contexts:
- name: webhook
context:
user: Default
cluster: Default
and added
"kube-apiserver-arg": [
"authentication-token-webhook-config-file=/var/lib/rancher/rke2/kube-api-authn-webhook.yaml"
to the /etc/rancher/rke2/config.yaml.d/50-rancher.yaml file.
After restarting rke2-server I found the network configuration tab in Rancher and was able to enable authorized endpoint. Here is where my success ends.
I tried to create a serviceaccount and got the secret to have token authorization, but it failed when connecting directly to the api endpoint on the master.
kube-api-auth pod logs this:
time="2022-10-06T08:42:27Z" level=error msg="found 1 parts of token"
time="2022-10-06T08:42:27Z" level=info msg="Processing v1Authenticate request..."
Also the log is full of messages like this:
E1006 09:04:07.868108 1 reflector.go:139] pkg/mod/github.com/rancher/client-go#v1.22.3-rancher.1/tools/cache/reflector.go:168: Failed to watch *v3.ClusterAuthToken: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)
E1006 09:04:40.778350 1 reflector.go:139] pkg/mod/github.com/rancher/client-go#v1.22.3-rancher.1/tools/cache/reflector.go:168: Failed to watch *v3.ClusterAuthToken: failed to list *v3.ClusterAuthToken: the server could not find the requested resource (get clusterauthtokens.meta.k8s.io)
E1006 09:04:45.171554 1 reflector.go:139] pkg/mod/github.com/rancher/client-go#v1.22.3-rancher.1/tools/cache/reflector.go:168: Failed to watch *v3.ClusterUserAttribute: failed to list *v3.ClusterUserAttribute: the server could not find the requested resource (get clusteruserattributes.meta.k8s.io)
I found that SA tokens will not work this way so I tried to use a rancher user token, but that fails as well:
time="2022-10-06T08:37:34Z" level=info msg=" ...looking up token for kubeconfig-user-qq9nrc86vv"
time="2022-10-06T08:37:34Z" level=error msg="clusterauthtokens.cluster.cattle.io \"cattle-system/kubeconfig-user-qq9nrc86vv\" not found"
Checking the cattle-system namespace, there are no SA and secret entries corresponding to the users created in rancher, however I found SA and secret entries related in cattle-impersonation-system.
I tried creating a new user, but that too, only resulted in new entries in cattle-impersonation-system namespace, so I presume kube-api-auth wrongly assumes the location of the secrets to be cattle-system namespace.
Now the questions:
Can I authenticate with downstream RKE2 cluster using normal SA tokens (not ones created through Rancher server)? If so, how?
What did I do wrong about adding the webhook authentication configuration? How to make it work?
I noticed, that since I made the modifications described above, I cannot download the kubeconfig file from the rancher UI for this cluster. What went wrong there?
Thanks in advance for any advice.

Secure mTLS communication within Istio-knative services + external requests

We are converting existing k8s services to use istio & knative. The services receive requests from external users as well as from within the cluster. We are trying to setup Istio AuthorizationPolicy to achieve the below requirements:
Certain paths (like docs/healthchecks) should not require any special header or anything and must be accessible from anywhere
Health & metric collection paths required to be accessed by knative must be accisible only by knative controllers
Any request coming from outside the cluster (through knative-serving/knative-ingress-gateway basically) must contain a key header matching a pre-shared key
Any request coming from any service within the cluster can access all the paths
Below is a sample of what I am trying. I am able to get the first 3 requirements working but not the last one...
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: my-svc
namespace: my-ns
spec:
selector:
matchLabels:
serving.knative.dev/service: my-svc
action: "ALLOW"
rules:
- to:
- operation:
methods:
- "GET"
paths:
- "/docs"
- "/openapi.json"
- "/redoc"
- "/rest/v1/healthz"
- to:
- operation:
methods:
- "GET"
paths:
- "/healthz*"
- "/metrics*"
when:
- key: "request.headers[User-Agent]"
values:
- "Knative-Activator-Probe"
- "Go-http-client/1.1"
- to:
- operation:
paths:
- "/rest/v1/myapp*"
when:
- key: "request.headers[my-key]"
values:
- "asjhfhjgdhjsfgjhdgsfjh"
- from:
- source:
namespaces:
- "*"
We have made no changes to the mTLS configuration provided by default by istio-knative setup, so assume that the mtls mode is currently PERMISSIVE.
Details of tech stack involved
AWS EKS - Version 1.21
Knative Serving - Version 1.1 (with Istio
1.11.5)
I'm not an Istio expert, but you might be able to express the last policy based on either the ingress gateway (have one which is listening only on a ClusterIP address), or based on the SourceIP being within the cluster. For the latter, I'd want to test that Istio is using the actual SourceIP and not substituting in the Forwarded header's IP address (a different reasonable configuration).

Sharing Acme configuration for multiple Traefik services

I have a server running Docker containers with Traefik. Let's say the machine's hostname is machine1.example.com, and each service runs as a subdomain, e.g. srv1.machine1.example.com, srv2.machine1.example.com, srv3.machine1.example.com....
I want to have LetsEncrypt generate a Wildcard certificate for *.machine1.example.com and use it for all of the services instead of generating a separate certificate for each service.
The annoyance is that I have to put the configuration lines into every single service's labels:
labels:
- traefik.http.routers.srv1.rule=Host(`srv1.machine1.example.com`)
- traefik.http.routers.srv1.tls=true
- traefik.http.routers.srv1.tls.certresolver=myresolver
- traefik.http.routers.srv1.tls.domains[0].main=machine1.example.com
- traefik.http.routers.srv1.tls.domains[0].sans=*.machine1.example.com
labels:
- traefik.http.routers.srv2.rule=Host(`srv2.machine1.example.com`)
- traefik.http.routers.srv2.tls=true
- traefik.http.routers.srv2.tls.certresolver=myresolver
- traefik.http.routers.srv2.tls.domains[0].main=machine1.example.com
- traefik.http.routers.srv2.tls.domains[0].sans=*.machine1.example.com
# etc.
This gets to be a lot of seemingly-needless boilerplate.
I tried work around it (in a way that is still ugly and annoying, but less so) by using the templating feature in the file provider like this:
[http]
[http.routers]
{{ range $i, $e := list "srv1" "srv2 }}
[http.routers."{{ $e }}".tls]
certResolver = "letsencrypt"
[[http.routers."{{ $e }}".tls.domains]]
main = "machine1.example.com"
sans = ["*.machine1.example.com"]
{{ end }}
That did not work because the routers created here are srv1#file, srv2#file instead of srv1#docker, srv2#docker which are created by the docker-compose configuration.
Is there any way to specify this configuration only once and have it apply to multiple services?

Can you configure ElasticBeanstalk Loadbalanced SSL cert via the .ebexensions file?

We have an AWS ElasticBeanstalk application. There are various environments, some load balanced some not.
At the moment with the load balanced ones, the SSL is configured manually in the console, whereas the single instance ones are configured via the .ebextensions file (and only for the single instance deployments).
Is there a way to configure the SSL for load balancers via the .ebextensions file as well, so we can keep it all in one place, and automate it?
I did not tried this yet, but while reading the documentation, I've discover that it is possible to automate it.
If you have any lucky on following the instructions in the documentation, please let me know.
Update:
I actually tested, and yes, it is possible. Here follow a configuration example:
option_settings:
- namespace: aws:elb:listener:443
option_name: ListenerProtocol
value: HTTPS
- namespace: aws:elb:listener:443
option_name: InstancePort
value: 80
- namespace: aws:elb:listener:443
option_name: InstanceProtocol
value: HTTP
- namespace: aws:elb:listener:443
option_name: SSLCertificateId
value: arn:aws:iam::<your arn cert id here>
- namespace: aws:elb:listener:80
option_name: ListenerEnabled
value: true
- namespace: aws:elb:listener:443
option_name: ListenerEnabled
value: true