K3s Vault Cluster -- http: server gave HTTP response to HTTPS client - ssl

I am trying to setup a 3 node vault cluster with raft storage enabled. I am currently at a loss to why the readiness probe (also the liveness probe) is returning
Readiness probe failed: Get "https://10.42.4.82:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204": http: server gave HTTP response to HTTPS client
I am using helm 3 for 'helm install vault hashicorp/vault --namespace vault -f override-values.yaml'
global:
enabled: true
tlsDisable: false
injector:
enabled: false
server:
image:
repository: "hashicorp/vault"
tag: "1.5.5"
resources:
requests:
memory: 1Gi
cpu: 2000m
limits:
memory: 2Gi
cpu: 2000m
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
VAULT_CACERT: /vault/userconfig/tls-ca/ca.crt
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Vault in the path `/vault/userconfig/<name>/`.
extraVolumes:
# holds the cert file and the key file
- type: secret
name: tls-server
# holds the ca certificate
- type: secret
name: tls-ca
auditStorage:
enabled: true
standalone:
enabled: false
# Run Vault in "HA" mode.
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/tls-server/tls.crt"
tls_key_file = "/vault/userconfig/tls-server/tls.key"
tls_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
}
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "ClusterIP"
serviceNodePort: null
externalPort: 8200
Return from describe pod vault-0
Name: vault-0
Namespace: vault
Priority: 0
Node: node4/10.211.55.7
Start Time: Wed, 11 Nov 2020 15:06:47 +0700
Labels: app.kubernetes.io/instance=vault
app.kubernetes.io/name=vault
component=server
controller-revision-hash=vault-5c4b47bdc4
helm.sh/chart=vault-0.8.0
statefulset.kubernetes.io/pod-name=vault-0
vault-active=false
vault-initialized=false
vault-perf-standby=false
vault-sealed=true
vault-version=1.5.5
Annotations: <none>
Status: Running
IP: 10.42.4.82
IPs:
IP: 10.42.4.82
Controlled By: StatefulSet/vault
Containers:
vault:
Container ID: containerd://6dfde76051f44c22003cc02a880593792d304e74c56d717eef982e0e799672f2
Image: hashicorp/vault:1.5.5
Image ID: docker.io/hashicorp/vault#sha256:90cfeead29ef89fdf04383df9991754f4a54c43b2fb49ba9ff3feb713e5ef1be
Ports: 8200/TCP, 8201/TCP, 8202/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Command:
/bin/sh
-ec
Args:
cp /vault/config/extraconfig-from-values.hcl /tmp/storageconfig.hcl;
[ -n "${HOST_IP}" ] && sed -Ei "s|HOST_IP|${HOST_IP?}|g" /tmp/storageconfig.hcl;
[ -n "${POD_IP}" ] && sed -Ei "s|POD_IP|${POD_IP?}|g" /tmp/storageconfig.hcl;
[ -n "${HOSTNAME}" ] && sed -Ei "s|HOSTNAME|${HOSTNAME?}|g" /tmp/storageconfig.hcl;
[ -n "${API_ADDR}" ] && sed -Ei "s|API_ADDR|${API_ADDR?}|g" /tmp/storageconfig.hcl;
[ -n "${TRANSIT_ADDR}" ] && sed -Ei "s|TRANSIT_ADDR|${TRANSIT_ADDR?}|g" /tmp/storageconfig.hcl;
[ -n "${RAFT_ADDR}" ] && sed -Ei "s|RAFT_ADDR|${RAFT_ADDR?}|g" /tmp/storageconfig.hcl;
/usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl
State: Running
Started: Wed, 11 Nov 2020 15:25:21 +0700
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 11 Nov 2020 15:19:10 +0700
Finished: Wed, 11 Nov 2020 15:20:20 +0700
Ready: False
Restart Count: 8
Limits:
cpu: 2
memory: 2Gi
Requests:
cpu: 2
memory: 1Gi
Liveness: http-get https://:8200/v1/sys/health%3Fstandbyok=true delay=60s timeout=3s period=5s #success=1 #failure=2
Readiness: http-get https://:8200/v1/sys/health%3Fstandbyok=true&sealedcode=204&uninitcode=204 delay=5s timeout=3s period=5s #success=1 #failure=2
Environment:
HOST_IP: (v1:status.hostIP)
POD_IP: (v1:status.podIP)
VAULT_K8S_POD_NAME: vault-0 (v1:metadata.name)
VAULT_K8S_NAMESPACE: vault (v1:metadata.namespace)
VAULT_ADDR: https://127.0.0.1:8200
VAULT_API_ADDR: https://$(POD_IP):8200
SKIP_CHOWN: true
SKIP_SETCAP: true
HOSTNAME: vault-0 (v1:metadata.name)
VAULT_CLUSTER_ADDR: https://$(HOSTNAME).vault-internal:8201
VAULT_RAFT_NODE_ID: vault-0 (v1:metadata.name)
HOME: /home/vault
VAULT_CACERT: /vault/userconfig/tls-ca/ca.crt
Mounts:
/home/vault from home (rw)
/var/run/secrets/kubernetes.io/serviceaccount from vault-token-lfgnj (ro)
/vault/audit from audit (rw)
/vault/config from config (rw)
/vault/data from data (rw)
/vault/userconfig/tls-ca from userconfig-tls-ca (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-vault-0
ReadOnly: false
audit:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: audit-vault-0
ReadOnly: false
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: vault-config
Optional: false
userconfig-tls-ca:
Type: Secret (a volume populated by a Secret)
SecretName: tls-ca
Optional: false
home:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
vault-token-lfgnj:
Type: Secret (a volume populated by a Secret)
SecretName: vault-token-lfgnj
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned vault/vault-0 to node4
Warning Unhealthy 17m (x2 over 17m) kubelet Liveness probe failed: Get "https://10.42.4.82:8200/v1/sys/health?standbyok=true": http: server gave HTTP response to HTTPS client
Normal Killing 17m kubelet Container vault failed liveness probe, will be restarted
Normal Pulled 17m (x2 over 18m) kubelet Container image "hashicorp/vault:1.5.5" already present on machine
Normal Created 17m (x2 over 18m) kubelet Created container vault
Normal Started 17m (x2 over 18m) kubelet Started container vault
Warning Unhealthy 13m (x56 over 18m) kubelet Readiness probe failed: Get "https://10.42.4.82:8200/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204": http: server gave HTTP response to HTTPS client
Warning BackOff 3m41s (x31 over 11m) kubelet Back-off restarting failed container
Logs from vault-0
2020-11-12T05:50:43.554426582Z ==> Vault server configuration:
2020-11-12T05:50:43.554524646Z
2020-11-12T05:50:43.554574639Z Api Address: https://10.42.4.85:8200
2020-11-12T05:50:43.554586234Z Cgo: disabled
2020-11-12T05:50:43.554596948Z Cluster Address: https://vault-0.vault-internal:8201
2020-11-12T05:50:43.554608637Z Go Version: go1.14.7
2020-11-12T05:50:43.554678454Z Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
2020-11-12T05:50:43.554693734Z Log Level: info
2020-11-12T05:50:43.554703897Z Mlock: supported: true, enabled: false
2020-11-12T05:50:43.554713272Z Recovery Mode: false
2020-11-12T05:50:43.554722579Z Storage: raft (HA available)
2020-11-12T05:50:43.554732788Z Version: Vault v1.5.5
2020-11-12T05:50:43.554769315Z Version Sha: f5d1ddb3750e7c28e25036e1ef26a4c02379fc01
2020-11-12T05:50:43.554780425Z
2020-11-12T05:50:43.672225223Z ==> Vault server started! Log data will stream in below:
2020-11-12T05:50:43.672519986Z
2020-11-12T05:50:43.673078706Z 2020-11-12T05:50:43.543Z [INFO] proxy environment: http_proxy= https_proxy= no_proxy=
2020-11-12T05:51:57.838970945Z ==> Vault shutdown triggered
I am running a 6 node rancher k3s cluster v1.19.3ks2 on my mac.
Any help would be appreciated

Related

problem to schedule pod in fargate with error "Pod not supported on Fargate: volumes not supported: dir-authentication not supported because:"

New to AWS EKS Fargate.
I created a cluster on aws EKS fargate and then proceed to install a helm chart; and the pods are all in pending state, looking at the pod description, I noticed there is some errors as
eksctl create cluster -f cluster-fargate.yaml
k -n bd describe pod bd-blackduck-authentication-6c8ff5cc85-jwr8m
Name: bd-blackduck-authentication-6c8ff5cc85-jwr8m
Namespace: bd
Priority: 2000001000
Priority Class Name: system-node-critical
Node: <none>
Labels: app=blackduck
component=authentication
eks.amazonaws.com/fargate-profile=fp-bd
name=bd
pod-template-hash=6c8ff5cc85
version=2021.10.5
Annotations: checksum/blackduck-config: 6c1796e5e4218c71ea2ae7a1249fefbb6f7c216f702ea38919a0bb9751b06922
checksum/postgres-config: f21777c0b5bf24b5535a5b4a8dbf98a5df9c9dd2f4a48e5219dcccf46301a982
kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/bd-blackduck-authentication-6c8ff5cc85
Init Containers:
bd-blackduck-postgres-waiter:
Image: docker.io/blackducksoftware/blackduck-postgres-waiter:1.0.0
Port: <none>
Host Port: <none>
Environment Variables from:
bd-blackduck-config ConfigMap Optional: false
Environment:
POSTGRES_HOST: <set to the key 'HUB_POSTGRES_HOST' of config map 'bd-blackduck-db-config'> Optional: false
POSTGRES_PORT: <set to the key 'HUB_POSTGRES_PORT' of config map 'bd-blackduck-db-config'> Optional: false
POSTGRES_USER: <set to the key 'HUB_POSTGRES_USER' of config map 'bd-blackduck-db-config'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85q7d (ro)
Containers:
authentication:
Image: docker.io/blackducksoftware/blackduck-authentication:2021.10.5
Port: 8443/TCP
Host Port: 0/TCP
Limits:
memory: 1Gi
Requests:
memory: 1Gi
Liveness: exec [/usr/local/bin/docker-healthcheck.sh https://127.0.0.1:8443/api/health-checks/liveness /opt/blackduck/hub/hub-authentication/security/root.crt /opt/blackduck/hub/hub-authentication/security/blackduck_system.crt /opt/blackduck/hub/hub-authentication/security/blackduck_system.key] delay=240s timeout=10s period=30s #success=1 #failure=10
Environment Variables from:
bd-blackduck-db-config ConfigMap Optional: false
bd-blackduck-config ConfigMap Optional: false
Environment:
HUB_MAX_MEMORY: 512m
DD_ENABLED: false
HUB_MANAGEMENT_ENDPOINT_PROMETHEUS_ENABLED: false
Mounts:
/opt/blackduck/hub/hub-authentication/ldap from dir-authentication (rw)
/opt/blackduck/hub/hub-authentication/security from dir-authentication-security (rw)
/tmp/secrets/HUB_POSTGRES_ADMIN_PASSWORD_FILE from db-passwords (rw,path="HUB_POSTGRES_ADMIN_PASSWORD_FILE")
/tmp/secrets/HUB_POSTGRES_USER_PASSWORD_FILE from db-passwords (rw,path="HUB_POSTGRES_USER_PASSWORD_FILE")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85q7d (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
dir-authentication:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: bd-blackduck-authentication
ReadOnly: false
db-passwords:
Type: Secret (a volume populated by a Secret)
SecretName: bd-blackduck-db-creds
Optional: false
dir-authentication-security:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-85q7d:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 58s fargate-scheduler Pod not supported on Fargate: volumes not supported: dir-authentication not supported because: PVC bd-blackduck-authentication not bound
my storageClass is currently set to gp2 in my values.yaml.
what can I do next to troubleshoot this?
Currently, Fargate does not support PersistentVolume back by EBS. You can use EFS instead.

Why Aren't My Environment Variables Set in Kubernetes Pods From ConfigMap?

I have the following configmap spec:
apiVersion: v1
data:
MY_NON_SECRET: foo
MY_OTHER_NON_SECRET: bar
kind: ConfigMap
metadata:
name: web-configmap
namespace: default
$ kubectl describe configmap web-configmap
Name: web-configmap
Namespace: default
Labels: <none>
Annotations: <none>
Data
====
MY_NON_SECRET:
----
foo
MY_OTHER_NON_SECRET:
----
bar
Events: <none>
And the following pod spec:
apiVersion: v1
kind: Pod
metadata:
name: web-pod
spec:
containers:
- name: web
image: kahunacohen/hello-kube:latest
envFrom:
- configMapRef:
name: web-configmap
ports:
- containerPort: 3000
$ kubectl describe pod web-deployment-5bb9d846b6-8k2s9
Name: web-deployment-5bb9d846b6-8k2s9
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Mon, 12 Jul 2021 12:22:24 +0300
Labels: app=web-pod
pod-template-hash=5bb9d846b6
service=web-service
Annotations: <none>
Status: Running
IP: 172.17.0.5
IPs:
IP: 172.17.0.5
Controlled By: ReplicaSet/web-deployment-5bb9d846b6
Containers:
web:
Container ID: docker://8de5472c9605e5764276c345865ec52f9ec032e01ed58bc9a02de525af788acf
Image: kahunacohen/hello-kube:latest
Image ID: docker-pullable://kahunacohen/hello-kube#sha256:930dc2ca802bff72ee39604533342ef55e24a34b4a42b9074e885f18789ea736
Port: 3000/TCP
Host Port: 0/TCP
State: Running
Started: Mon, 12 Jul 2021 12:22:27 +0300
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcqwz (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-tcqwz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned default/web-deployment-5bb9d846b6-8k2s9 to minikube
Normal Pulling 19m kubelet Pulling image "kahunacohen/hello-kube:latest"
Normal Pulled 19m kubelet Successfully pulled image "kahunacohen/hello-kube:latest" in 2.3212119s
Normal Created 19m kubelet Created container web
Normal Started 19m kubelet Started container web
The pod has container that is running expressjs with this code which is trying to print out the env vars set in the config map:
const process = require("process");
const express = require("express");
const app = express();
app.get("/", (req, res) => {
res.send(`<h1>Kubernetes Expressjs Example 0.3</h2>
<h2>Non-Secret Configuration Example</h2>
<p>This uses ConfigMaps as env vars.</p>
<ul>
<li>MY_NON_SECRET: "${process.env.MY_NON_SECRET}"</li>
<li>MY_OTHER_NON_SECRET: "${process.env.MY_OTHER_NON_SECRET}"</li>
</ul>
`);
});
app.listen(3000, () => {
console.log("Listening on http://localhost:3000");
})
When I deploy these pods, the env vars are undefined
When I do $ kubectl exec {POD_NAME} -- env
I don't see my env vars.
What am I doing wrong? I've tried killing the pods, waiting till they restart then check again to no avail.
It looks like your pods are managed by web-deployment deployment. You cannot patch such pods directly.
If you run kubectl get pod <pod-name> -n <namespace> -oyaml, you'll see a block called ownerReferences under the metadata section. This tells you who is the owner/manager of this pod.
In case of a deployment, here is the ownership hierarchy:
Deployment -> ReplicaSet -> Pod
i.e. A deployment creates replicaset and replicaset in turn creates pod.
So, if you want to change anything in the pod Spec, you should make that change in the deployment, not in the replicaset or the pod directly as they will get overwritten.
Patch your deployment either by running and edit the environment field there:
kubectl edit deployment.apps <deployment-name> -n <namespace>
or update the deployment yaml with your changes and run
kubectl apply -f <deployment-yaml-file>

hashcrop vault on k8s : Error initializing listener of type tcp: error loading TLS cert: open : no such file or directory

After configuring the self sigh TLS following this git issue
After installing NOT in standalone but rather using storage "raft" + ha as in this beginner tutorial
I'm getting in each pod installed those errors:
Error initializing listener of type tcp: error loading TLS cert: open: no such file or directory
i dont understand ... i only created the secret in k8s and didnt now upload and files,
from where do it takes them?
what is this path /vault/userconfig/vault-server-tls/ ?
if i do :
kubectl get secret vault-server-tls -n vault-foo -o yaml
im getting :
apiVersion: v1
data:
vault.ca: LS0t.....GSUNBVEUtLS0tLQo=
vault.crt: LS0tLS1CRUdJ....LS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
vault.key: LS0tLS1CR....S0tDQo=
kind: Secret
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"v1","data":{"vault.ca":"LS0tLS1CR...FWS0tLS0tDQo="},"kind":"Secret","metadata":{"annotations":{},"creationTimestamp":"2021-01-21T09:34:31Z","managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:data":{".":{},"f:vault.ca":{},"f:vault.crt":{},"f:vault.key":{}},"f:type":{}},"manager":"kubectl.exe","operation":"Update","time":"2021-01-21T09:34:31Z"}],"name":"vault-server-tls","namespace":"vault-foo","selfLink":"/api/v1/namespaces/vault-foo/secrets/vault-server-tls","uid":"845b856e-d934-46dd-b094-ca75084542cd"},"type":"Opaque"}
creationTimestamp: "2021-01-21T09:34:31Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
.: {}
f:vault.ca: {}
f:vault.crt: {}
f:vault.key: {}
f:metadata:
f:annotations:
.: {}
f:kubectl.kubernetes.io/last-applied-configuration: {}
f:type: {}
manager: kubectl.exe
operation: Update
time: "2021-01-21T09:39:10Z"
name: vault-server-tls
namespace: vault-foo
resourceVersion: "62302347"
selfLink: /api/v1/namespaces/vault-foo/secrets/vault-server-tls
uid: 845b856e-d934-46dd-b094-ca75084542cd
type: Opaque
The helm
# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: false
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "latest"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
server:
# Use the Enterprise Image
image:
repository: "hashicorp/vault-enterprise"
tag: "1.5.0_ent"
# These Resource Limits are in line with node requirements in the
# Vault Reference Architecture for a Small Cluster
resources:
requests:
memory: 8Gi
cpu: 2000m
limits:
memory: 16Gi
cpu: 2000m
# For HA configuration and because we need to manually init the vault,
# we need to define custom readiness/liveness Probe settings
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
# extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
# used to include variables required for auto-unseal.
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.crt
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Vault in the path .
#extraVolumes:
# - type: secret
# name: tls-server
# - type: secret
# name: tls-ca
# - type: secret
# name: kms-creds
extraVolumes:
- type: secret
name: vault-server-tls
# This configures the Vault Statefulset to create a PVC for audit logs.
# See https://www.vaultproject.io/docs/audit/index.html to know more
auditStorage:
enabled: true
standalone:
enabled: false
# Run Vault in "HA" mode.
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
#tls_disable = 1
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "http://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tlsr/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 8200
# For Added Security, edit the below
#loadBalancerSourceRanges:
# - < Your IP RANGE Ex. 10.0.0.0/16 >
# - < YOUR SINGLE IP Ex. 1.78.23.3/32 >

debugging cert-manager certificate creation failure on AKS

I'm deploying cert-manager on Azure AKS and trying to have it request a Let's Encrypt certificate. It fails with certificate signed by unknown authority error and I have problem troubleshooting it further.
Not sure whether this is a problem with trusting LE server, a tunnelfront pod, or maybe an internal AKS self-generated CA. So my questions would be:
how to force cert-manager to debug (display more info) regarding the certificate it does not trust?
maybe the problem is occuring regularly and there is a known solution?
what steps should be undertaken to debug the issue further?
I have created an issue on jetstack/cert-manager's Github page, but was not answered, so I came here.
The whole story is as follows:
Certificates are not created. The following errors are reported:
the certificate:
Error from server: conversion webhook for &{map[apiVersion:cert-manager.io/v1alpha2 kind:Certificate metadata:map[creationTimestamp:2020-05-13T17:30:48Z generation:1 name:xxx-tls namespace:test ownerReferences:[map[apiVersion:extensions/v1beta1 blockOwnerDeletion:true controller:true kind:Ingress name:xxx-ingress uid:6d73b182-bbce-4834-aee2-414d2b3aa802]] uid:d40bc037-aef7-4139-868f-bd615a423b38] spec:map[dnsNames:[xxx.test.domain.com] issuerRef:map[group:cert-manager.io kind:ClusterIssuer name:letsencrypt-prod] secretName:xxx-tls] status:map[conditions:[map[lastTransitionTime:2020-05-13T18:55:31Z message:Waiting for CertificateRequest "xxx-tls-1403681706" to complete reason:InProgress status:False type:Ready]]]]} failed: Post https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s: x509: certificate signed by unknown authority
cert-manager-webhook container:
cert-manager 2020/05/15 14:22:58 http: TLS handshake error from 10.20.0.19:35350: remote error: tls: bad certificate
Where 10.20.0.19 is the IP of tunnelfront pod.
Debugging with https://cert-manager.io/docs/faq/acme/ sort of "fails" when trying to kubectl describe order... as kubectl describe certificaterequest... returns CSR contents with error (as above), but not the order ID.
Environment details:
Kubernetes version: 1.15.10
Cloud-provider/provisioner : Azure (AKS)
cert-manager version: 0.14.3
Install method: static manifests (see below) + cluster issuer (see below) + regular CRDs (not legacy)
cluster issuer:
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: x
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- dns01:
azuredns:
clientID: x
clientSecretSecretRef:
name: cert-manager-stage
key: CLIENT_SECRET
subscriptionID: x
tenantID: x
resourceGroupName: dns-stage
hostedZoneName: x
the manifest:
imagePullSecrets: []
isOpenshift: false
priorityClassName: ""
rbac:
create: true
podSecurityPolicy:
enabled: false
logLevel: 2
leaderElection:
namespace: "kube-system"
replicaCount: 1
strategy: {}
image:
repository: quay.io/jetstack/cert-manager-controller
pullPolicy: IfNotPresent
tag: v0.14.3
clusterResourceNamespace: ""
serviceAccount:
create: true
name:
annotations: {}
extraArgs: []
extraEnv: []
resources: {}
securityContext:
enabled: false
fsGroup: 1001
runAsUser: 1001
podAnnotations: {}
podLabels: {}
nodeSelector: {}
ingressShim:
defaultIssuerName: letsencrypt-prod
defaultIssuerKind: ClusterIssuer
prometheus:
enabled: true
servicemonitor:
enabled: false
prometheusInstance: default
targetPort: 9402
path: /metrics
interval: 60s
scrapeTimeout: 30s
labels: {}
affinity: {}
tolerations: []
webhook:
enabled: true
replicaCount: 1
strategy: {}
podAnnotations: {}
extraArgs: []
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
image:
repository: quay.io/jetstack/cert-manager-webhook
pullPolicy: IfNotPresent
tag: v0.14.3
injectAPIServerCA: true
securePort: 10250
cainjector:
replicaCount: 1
strategy: {}
podAnnotations: {}
extraArgs: []
resources: {}
nodeSelector: {}
affinity: {}
tolerations: []
image:
repository: quay.io/jetstack/cert-manager-cainjector
pullPolicy: IfNotPresent
tag: v0.14.3
Seems that v0.14.3 had a bug of some sort. The problem does not occur for v0.15.0.

Tensorflow serving object detection predict using Kubeflow

I followed the steps given in this post to deploy my tensorflow model for prediction using GPUs on Google Kubernetes Engine and Kubeflow. I have exposed the service as a load balancer by modifying the YAML file in this way where I changed the type from ClusterIP to LoadBalancer.
spec:
clusterIP: A.B.C.D
externalTrafficPolicy: Cluster
ports:
- name: grpc-tf-serving
nodePort: 30098
port: 9000
protocol: TCP
targetPort: 9000
- name: http-tf-serving-proxy
nodePort: 31399
port: 8000
protocol: TCP
targetPort: 8000
selector:
app: my-model
sessionAffinity: None
type: LoadBalancer
The status changed to:
status:
loadBalancer:
ingress:
- ip: W.X.Y.Z
Service specs (kubectl describe services my-model):
Name: my-model
Namespace: default
Labels: app=my-model
app.kubernetes.io/deploy-manager=ksonnet
ksonnet.io/component=model2
Annotations: getambassador.io/config:
---
apiVersion: ambassador/v0
kind: Mapping
name: tfserving-mapping-my-model-get
prefix: /models/my-model/
rewrite: /
method: GET
service: my-model.default:8000
---
apiVersion: ambassador/v0
kind: Mapping
name: tfserving-mapping-my-model-post
prefix: /models/my-model/
rewrite: /model/my-model:predict
method: POST
service: my-model.default:8000
ksonnet.io/managed:
{"pristine":"H4sIAAAAAAAA/7SRMY/UQAyFe35F5DpzCVweRcHW4QQBWKlQzQMhS/jZEckHmvGt9xplf+OZvfYjXRCgoIyz+/L8xsfgTR+5VxiEkA4vIYWfkQJgHDH+RAHhhYWNgpkB...
Selector: app=my-model
Type: LoadBalancer
IP: A.B.C.D
LoadBalancer Ingress: W.X.Y.Z
Port: grpc-tf-serving 9000/TCP
TargetPort: 9000/TCP
NodePort: grpc-tf-serving 30098/TCP
Endpoints: P.Q.R.S:9000
Port: http-tf-serving-proxy 8000/TCP
TargetPort: 8000/TCP
NodePort: http-tf-serving-proxy 31399/TCP
Endpoints: R.Q.R.S:8000
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Pods Specs (kubectl describe pods):
Name: my-model-v1-bd6ccb757-qrwdv
Namespace: default
Node: gke-kuberflow-xyz-gpu-pool-5d4ebf17-56mf/SOME_IP
Start Time: Mon, 18 Feb 2019 18:11:24 +0530
Labels: app=my-model
pod-template-hash=682776313
version=v1
Annotations: <none>
Status: Running
IP: P.Q.R.S
Controlled By: ReplicaSet/my-model-v1-bd6ccb757
Containers:
my-model:
Container ID: docker://d14e8261ddfe606393da2ee45badac0136cee98rwa5611c47ad85733ce5d2c925
Image: tensorflow/serving:1.11.1-gpu
Image ID: docker-pullable://tensorflow/serving#sha256:907d7db828b28ewer234d0b3ca10e2d66bcd8ef82c5cccea761fcd4f1190191d2f
Port: 9000/TCP
Host Port: 0/TCP
Command:
/usr/bin/tensorflow_model_server
Args:
--port=9000
--model_name=my-model
--model_base_path=gs://xyz_kuber_app-xyz-identification/export/
State: Running
Started: Mon, 18 Feb 2019 18:11:25 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 4
memory: 4Gi
nvidia.com/gpu: 1
Requests:
cpu: 1
memory: 1Gi
nvidia.com/gpu: 1
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b6dpn (ro)
my-model-http-proxy:
Container ID: docker://c98e06ad75f3456c353395e9ad2e2e3bcbf0b38cd2634074704439cd5ebf335d
Image: gcr.io/kubeflow-images-public/tf-model-server-http-proxy:v20180606-asdasda
Image ID: docker-pullable://gcr.io/kubeflow-images-public/tf-model-server-http-proxy#sha256:SHA
Port: 8000/TCP
Host Port: 0/TCP
Command:
python
/usr/src/app/server.py
--port=8000
--rpc_port=9000
--rpc_timeout=10.0
State: Running
Started: Mon, 18 Feb 2019 18:11:25 +0530
Ready: True
Restart Count: 0
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 500m
memory: 500Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-b6dpn (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-b6dpn:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-fsdf3
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
nvidia.com/gpu:NoSchedule
Events: <none>
I used the command python predict.py --url=http://W.X.Y.Z:8000/model/my-model:predict to perform the prediction from the serving_script folder but I am getting the a 500 Internal server error as the response. What is going wrong here?
The code for prediction can be found here: https://github.com/kubeflow/examples/tree/master/object_detection/serving_script
It was a mistake from my end. I was using a different input image array format for the model. I was sending an image tensor instead of encoded image string tensor.