Back-off restarting failed container while creating a redis deployment in k8s - redis

I am trying to launch a deployment object in k8s with kubernetes/redis image. But I am getting error Back-off restarting failed container. The issue occurs with just redis image and I am able to successfully run deployments with postgres image etc.
Here is the config file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-deployment
spec:
replicas: 1
selector:
matchLabels:
component: redis
template:
metadata:
labels:
component: redis
spec:
containers:
- name: redis
image: kubernetes/redis
ports:
- containerPort: 6379
Describe pod output:
Name: redis-deployment-57dcf8ff69-9v8sz
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: minikube/10.0.2.15
Start Time: Sun, 10 Mar 2019 11:13:00 +0530
Labels: component=redis
pod-template-hash=57dcf8ff69
Annotations: <none>
Status: Running
IP: 172.17.0.8
Controlled By: ReplicaSet/redis-deployment-57dcf8ff69
Containers:
redis:
Container ID: docker://556544175a99da6cd704ddc5ae6e65ee0a424275872d86543bbfef6eebceff5b
Image: kubernetes/redis
Image ID: docker-pullable://kubernetes/redis#sha256:60e8254f473b1df64340da257e8e0a029c0ac67a76bdde296f11eba6cde515c7
Port: 6379/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 10 Mar 2019 20:12:26 +0530
Finished: Sun, 10 Mar 2019 20:13:28 +0530
Ready: False
Restart Count: 13
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-zqj5b (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-zqj5b:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-zqj5b
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9h default-scheduler Successfully assigned default/redis-deployment-57dcf8ff69-9v8sz to minikube
Normal Pulling 5h (x5 over 9h) kubelet, minikube pulling image "kubernetes/redis"
Normal Pulled 5h (x5 over 5h) kubelet, minikube Successfully pulled image "kubernetes/redis"
Normal Created 5h (x5 over 5h) kubelet, minikube Created container
Normal Started 5h (x5 over 5h) kubelet, minikube Started container
Warning BackOff 5h (x53 over 5h) kubelet, minikube Back-off restarting failed container
Normal SandboxChanged 17m kubelet, minikube Pod sandbox changed, it will be killed and re-created.
Normal Pulling 11m (x4 over 16m) kubelet, minikube pulling image "kubernetes/redis"
Normal Pulled 11m (x4 over 16m) kubelet, minikube Successfully pulled image "kubernetes/redis"
Normal Created 11m (x4 over 16m) kubelet, minikube Created container
Normal Started 11m (x4 over 16m) kubelet, minikube Started container
Warning BackOff 1m (x39 over 15m) kubelet, minikube Back-off restarting failed container
I am using the kubernetes/redis image in this example as using redis image completely fails because kubectl is not able to get the image pulled from dockerhub and I get the error which says failed in pulling the image. Not sure why !
Can anyone please help me out here.
EDIT---
Logs
kubectl.exe logs redis-deployment-57dcf8ff69-9v8sz
Could not connect to Redis at -p:6379: Name or service not known
Failed to find master.

Below config file worked for me.
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-deployment
spec:
replicas: 1
selector:
matchLabels:
component: redis
template:
metadata:
labels:
component: redis
spec:
containers:
- name: redis
image: gcr.io/google_containers/redis:v1
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379

Related

Strimzi Kafka Zookeeper not starting

i'm trying to deploy kafka using strimzi, but zookeeper keep throwing following exception
Failed to verify hostname: 10.244.0.14 (org.apache.zookeeper.common.ZKTrustManager) [ListenerHandler-my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.kafka.svc/10.244.1.20:3888]
javax.net.ssl.SSLPeerUnverifiedException:
Certificate for <10.244.0.14> doesn't match any of the subject alternative names: [*.my-
cluster-zookeeper-client.kafka.svc,
my-cluster-zookeeper-client, my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc.cluster.local,
my-cluster-zookeeper-1.my-cluster-zookeeper-nodes.kafka.svc, my-cluster-zookeeper-client.kafka, my-cluster-zookeeper-client.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc,
*.my-cluster-zookeeper-nodes.kafka.svc.cluster.local, *.my-cluster-zookeeper-client.kafka.svc.cluster.local, my-cluster-zookeeper-client.kafka.svc.cluster.local]
below is the deployment file i'm using
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 3.1.0
replicas: 2
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: external
port: 9094
type: loadbalancer
tls: false
config:
offsets.topic.replication.factor: 2
transaction.state.log.replication.factor: 2
transaction.state.log.min.isr: 2
default.replication.factor: 2
min.insync.replicas: 2
inter.broker.protocol.version: "3.1"
storage:
type: ephemeral
zookeeper:
replicas: 2
storage:
type: ephemeral
this is how i created strimzi cluster operator
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka

Asp.net Core get "Back-off restarting failed container" on AKS

I am trying to deploy my very first and simple ASP.net Core Web Api on the AKS (ref to this article)
Here is my yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: aexp
labels:
app: aexp
spec:
replicas: 1
selector:
matchLabels:
service: aexp
template:
metadata:
labels:
app: aexp
service: aexp
spec:
containers:
- name: aexp
image: f2021.azurecr.io/aexp:v1
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
env:
- name: ASPNETCORE_URLS
value: http://+:80
---
apiVersion: v1
kind: Service
metadata:
name: aexp
labels:
app: aexp
service: aexp
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
selector:
service: aexp
It looks simple and straightforward, but I couldn't figure out why my pod gets Back-off restarting failed container. Any advice a clue to prevent the error? thanks in advance.
Name: aexp-5b5b7b6464-5lfz4
Namespace: default
Priority: 0
Node: aks-nodepool1-38572550-vmss000000/10.240.0.4
Start Time: Wed, 20 Jan 2021 10:01:52 +0700
Labels: app=aexp
pod-template-hash=5b5b7b6464
service=aexp
Annotations: <none>
Status: Running
IP: 10.244.0.14
IPs:
IP: 10.244.0.14
Controlled By: ReplicaSet/aexp-5b5b7b6464
Containers:
aexp:
Container ID: docker://25ffdb3ce92eeda465e1971daa363d6f532ac73ff82df2e9b3694a8949f50615
Image: f2021.azurecr.io/aexp:v1
Image ID: docker-pullable://f2021.azurecr.io/aexp#sha256:bf6aa2a47f5f857878280f5987192f1892e91e365b9e66df83538109b9e57c46
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 20 Jan 2021 10:33:47 +0700
Finished: Wed, 20 Jan 2021 10:33:47 +0700
Ready: False
Restart Count: 11
Environment:
ASPNETCORE_URLS: http://+:80
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-g4ks9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-g4ks9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-g4ks9
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36m default-scheduler Successfully assigned default/aexp-5b5b7b6464-5lfz4 to aks-nodepool1-38572550-vmss000000
Normal Pulled 35m (x4 over 36m) kubelet Successfully pulled image "f2021.azurecr.io/aexp:v1"
Normal Created 35m (x4 over 36m) kubelet Created container aexp
Normal Started 35m (x4 over 36m) kubelet Started container aexp
Normal Pulling 34m (x5 over 36m) kubelet Pulling image "f2021.azurecr.io/aexp:v1"
Warning BackOff 62s (x166 over 36m) kubelet Back-off restarting failed container
And here is my az snippet to create AKS cluster
az aks create \
--location $REGION \
--resource-group $AKS_RG \
--name $AKS_NAME \
--ssh-key-value ./.ssh/id_rsa.pub \
--service-principal "xxxxxxxx-b8d1-4206-8a8a-xxxxx66c086c" \
--client-secret "xxxx.xxxxeNzq25iJeuRjWTh~xxxxxUGxu" \
--network-plugin kubenet \
--load-balancer-sku basic \
--outbound-type loadBalancer \
--node-vm-size Standard_B2s \
--node-count 1 \
--tags 'ENV=DEV' 'SRV=EXAMPLE' \
--generate-ssh-keys
Update 1:
I try with VS2019, start Debug using “Bridge to Kubernetes”, then it works, the same docker image, same deployment and same service.
Update 2: add docker file
#See https://aka.ms/containerfastmode to understand how Visual Studio uses this Dockerfile to build your images for faster debugging.
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443
FROM mcr.microsoft.com/dotnet/core/sdk:3.1-buster AS build
WORKDIR /src
COPY ["Aexp/Aexp.csproj", "Aexp/"]
RUN dotnet restore "Aexp/Aexp.csproj"
COPY . .
WORKDIR "/src/Aexp"
RUN dotnet build "Aexp.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "Aexp.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "Aexp.dll"]
Update 3 [Jan 27] : I figured out the issue doesn't relate to my code or my yaml, at all. I have 02 azure subscriptions, one got the issue, one is working just fine with the same code, same deployment.yaml and configuration.
There can be several reasons for the pod to be crashing. The best way forward is to check the logs of your pod to see if crash comes from your application.
kubectl logs aexp-5b5b7b6464-5lfz4 --previous
Where --previous makes sure you can access the logs from the crashed pod.
If the log is empty you want to check your Dockerfile. It seems that the container does not have any long running process because it completed with a 'success' exit code:
Last State: Terminated
Reason: Completed
Exit Code: 0

Debugging istio rate limiting handler

I'm trying to apply rate limiting on some of our internal services (inside the mesh).
I used the example from the docs and generated redis rate limiting configurations that include a (redis) handler, quota instance, quota spec, quota spec binding and rule to apply the handler.
This redis handler:
apiVersion: config.istio.io/v1alpha2
kind: handler
metadata:
name: redishandler
namespace: istio-system
spec:
compiledAdapter: redisquota
params:
redisServerUrl: <REDIS>:6379
connectionPoolSize: 10
quotas:
- name: requestcountquota.instance.istio-system
maxAmount: 10
validDuration: 100s
rateLimitAlgorithm: FIXED_WINDOW
overrides:
- dimensions:
destination: s1
maxAmount: 1
- dimensions:
destination: s3
maxAmount: 1
- dimensions:
destination: s2
maxAmount: 1
The quota instance (I'm only interested in limiting by destination at the moment):
apiVersion: config.istio.io/v1alpha2
kind: instance
metadata:
name: requestcountquota
namespace: istio-system
spec:
compiledTemplate: quota
params:
dimensions:
destination: destination.labels["app"] | destination.service.host | "unknown"
A quota spec, charging 1 per request if I understand correctly:
apiVersion: config.istio.io/v1alpha2
kind: QuotaSpec
metadata:
name: request-count
namespace: istio-system
spec:
rules:
- quotas:
- charge: 1
quota: requestcountquota
A quota binding spec that all participating services pre-fetch. I also tried with service: "*" which also did nothing.
apiVersion: config.istio.io/v1alpha2
kind: QuotaSpecBinding
metadata:
name: request-count
namespace: istio-system
spec:
quotaSpecs:
- name: request-count
namespace: istio-system
services:
- name: s2
namespace: default
- name: s3
namespace: default
- name: s1
namespace: default
# - service: '*' # Uncomment this to bind *all* services to request-count
A rule to apply the handler. Currently on all occasions (tried with matches but didn't change anything as well):
apiVersion: config.istio.io/v1alpha2
kind: rule
metadata:
name: quota
namespace: istio-system
spec:
actions:
- handler: redishandler
instances:
- requestcountquota
The VirtualService definitions are pretty similar for all participants:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: s1
spec:
hosts:
- s1
http:
- route:
- destination:
host: s1
The problem is nothing really happens and no rate limiting takes place. I tested with curl from pods inside the mesh. The redis instance is empty (no keys on db 0, which I assume is what the rate limiting would use) so I know it can't practically rate-limit anything.
The handler seems to be configured properly (how can I make sure?) because I had some errors in it which were reported in mixer (policy). There are still some errors but none which I associate to this problem or the configuration. The only line in which redis handler is mentioned is this:
2019-12-17T13:44:22.958041Z info adapters adapter closed all scheduled daemons and workers {"adapter": "redishandler.istio-system"}
But its unclear if its a problem or not. I assume its not.
These are the rest of the lines from the reload once I deploy:
2019-12-17T13:44:22.601644Z info Built new config.Snapshot: id='43'
2019-12-17T13:44:22.601866Z info adapters getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.601881Z warn Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2019-12-17T13:44:22.602718Z info adapters Waiting for kubernetes cache sync... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903844Z info adapters Cache sync successful. {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903878Z info adapters getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903882Z warn Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2019-12-17T13:44:22.904808Z info Setting up event handlers
2019-12-17T13:44:22.904939Z info Starting Secrets controller
2019-12-17T13:44:22.904991Z info Waiting for informer caches to sync
2019-12-17T13:44:22.957893Z info Cleaning up handler table, with config ID:42
2019-12-17T13:44:22.957924Z info adapters deleted remote controller {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.957999Z info adapters adapter closed all scheduled daemons and workers {"adapter": "prometheus.istio-system"}
2019-12-17T13:44:22.958041Z info adapters adapter closed all scheduled daemons and workers {"adapter": "redishandler.istio-system"}
2019-12-17T13:44:22.958065Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958050Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958096Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958182Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:23.958109Z info adapters adapter closed all scheduled daemons and workers {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:55:21.042131Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-12-17T14:14:00.265722Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I'm using the demo profile with disablePolicyChecks: false to enable rate limiting. This is on istio 1.4.0, deployed on EKS.
I also tried memquota (this is our staging environment) with low limits and nothing seems to work. I never got a 429 no matter how much I went over the rate limit configured.
I don't know how to debug this and see where the configuration is wrong causing it to do nothing.
Any help is appreciated.
I too spent hours trying to decipher the documentation and get a sample working.
According to the documentation, they recommended that we enable policy checks:
https://istio.io/docs/tasks/policy-enforcement/rate-limiting/
However when that did not work, I did an "istioctl profile dump", searched for policy, and tried several settings.
I used Helm install and passed the following and then was able to get the described behaviour:
--set global.disablePolicyChecks=false \
--set values.pilot.policy.enabled=true \ ===> this made it work, but it's not in the docs.

DigitalOcean pod has unbound immediate PersistentVolumeClaims

I am trying to run a Redis cluster in Kubernetes in DigitalOcean.
As a poc, I simply tried running an example I found online (https://github.com/sanderploegsma/redis-cluster/blob/master/redis-cluster.yml), which is able to spin up the pods appropriately when running locally using minikube.
However, when running it on Digital Ocean, I always get the following error:
Warning FailedScheduling 3s (x8 over 17s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 4 times)
Given that I am not changing anything, I am not sure why this would not work. Does anyone have any suggestions?
EDIT: some additional info
$ kubectl describe pvc
Name: data-redis-cluster-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 3m19s (x3420 over 14h) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: <none>
EDIT: setting the default storage class partially resolved the problem!
However, the node is now not able to find available volumes to bind:
kubectl describe pvc:
Name: data-redis-cluster-0
Namespace: default
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 12m (x9 over 13m) persistentvolume-controller waiting for first consumer to be created before binding
Normal WaitForFirstConsumer 3m19s (x26 over 9m34s) persistentvolume-controller waiting for first consumer to be created before binding
kubectl describe pod redis-cluster-0
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16m (x25 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't find available persistent volumes to bind.
kubectl describe sc
Name: local-storage
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/no-provisioner
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
kubernetes manager pod logs:
I1028 15:30:56.154131 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Claim data-redis-cluster-0 Pod redis-cluster-0 in StatefulSet redis-cluster success
I1028 15:30:56.166649 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588816", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
I1028 15:30:56.220464 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Pod redis-cluster-0 in StatefulSet redis-cluster successful
I1028 15:30:57.004631 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588825", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
This:
no storage class is set
And an empty output for kubectl describe sc means that there's no storage class.
I recommend installing the CSI-driver for Digital Ocean. That will create a do-block-storage class using the Kubernetes CSI interface.
Another option is to use local storage. Using a local storage class:
$ cat <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF | kubectl apply -f -
Then for either case you may need to set it as a default storage class if you don't specify storageClassName in your PVC:
$ kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
or
$ kubectl patch storageclass do-block-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
It is a statefulSet using PersistentVolumeClaims
You need to configure a default storageClass in your cluster so that the PersistentVolumeClaim can take the storage from there.
In minikube one is already available so it succeeds without error:
C02W84XMHTD5:ucp iahmad$ kubectl get sc --all-namespaces
NAME PROVISIONER AGE
standard (default) k8s.io/minikube-hostpath 7d

Kubernetes Hostpath External Provisioner - PVC Pending

I have set up a single node K8S cluster using kubeadm by following the instructions here:
The cluster is up and all system pods are running fine:
[root#umeshworkstation hostpath-provisioner]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-n988r 1/1 Running 10 6h
calico-node-n1wmk 2/2 Running 10 6h
calico-policy-controller-1777954159-bd8rn 1/1 Running 0 6h
etcd-umeshworkstation 1/1 Running 1 6h
kube-apiserver-umeshworkstation 1/1 Running 1 6h
kube-controller-manager-umeshworkstation 1/1 Running 1 6h
kube-dns-3913472980-2ptjj 0/3 Pending 0 6h
kube-proxy-1d84l 1/1 Running 1 6h
kube-scheduler-umeshworkstation 1/1 Running 1 6h
I then downloaded Hostpath external provisioner code from kubernetes-incubator and built it locally on the same node. The docker image for provisioner built got successfully and I could even instantiate the provisioner pod using pod.yaml from same location. The pod is running fine:
[root#umeshworkstation hostpath-provisioner]# kubectl describe pod hostpath-provisioner
Name: hostpath-provisioner
Namespace: default
Node: umeshworkstation/172.17.24.123
Start Time: Tue, 09 May 2017 23:44:41 -0400
Labels: <none>
Annotations: <none>
Status: Running
IP: 192.168.8.65
Controllers: <none>
Containers:
hostpath-provisioner:
Container ID: docker://c600cfa7a2f5f958ad24e83372a1276a91b41cb67773b9605af4a0ae021ec914
Image: hostpath-provisioner:latest
Image ID: docker://sha256:f6def41ba7c096701c65bf0c0aba6ff31e030573e1a900e378432491ecc5c556
Port:
State: Running
Started: Tue, 09 May 2017 23:44:45 -0400
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/tmp/hostpath-provisioner from pv-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7wwvj (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
pv-volume:
Type: HostPath (bare host directory volume)
Path: /tmp/hostpath-provisioner
default-token-7wwvj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7wwvj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events: <none>
I then created the storage class as per the instructions of project home, and storage class is created fine:
[root#umeshworkstation hostpath-provisioner]# kubectl describe sc example-hostpath
Name: example-hostpath
IsDefaultClass: No
Annotations: <none>
Provisioner: example.com/hostpath
Parameters: <none>
Events: <none>
The next step was to create a PVC using claim.yaml from same location, but PVC is remaining in Pending state, and describe shows its not able to locate the provisioner example.com/hostpath:
[root#umeshworkstation hostpath-provisioner]# kubectl describe pvc
Name: hostpath
Namespace: default
StorageClass: example-hostpath
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-class=example-hostpath
volume.beta.kubernetes.io/storage-provisioner=example.com/hostpath
Capacity:
Access Modes:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2h 11s 874 persistentvolume-controller Normal ExternalProvisioning cannot find provisioner "example.com/hostpath", expecting that a volume for the claim is provisioned either manually or via external software
The PVC has remained forever in Pending state because of this.
Am I missing something?
I have figured out the issue.. Thanks #jaxxstorm for helping me move towards in the right direction.
When I inspected provisioner pod logs I could see that its unable to access the API server to list StorageClass, PVC or PVs as it was created with default service account, which does not have the privileges to access these APIs.
The solution was to create a separate service account, pod security policy, cluster role and cluster role binding, as explained for NFS external provisioner here
After this I could see my PVC getting binded to the volume and hostpath showing the mount
[root#umeshworkstation hostpath-provisioner]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
hostpath Bound pvc-8179c8d6-36db-11e7-9ed4-005056a21a50 1Mi RWX example-hostpath 1m
[root#umeshworkstation hostpath-provisioner]# ls /tmp/hostpath-provisioner/
pvc-8179c8d6-36db-11e7-9ed4-005056a21a50