I am a total newbie in terms of kubernetes/atomic host, so my question may be really trivial or well discussed already - but unfortunately i couldn't find any clues how to achieve my goal - that's why i am here.
I have set up kubernetes cluster on atomic hosts (right now i have just one master and one node). I am working in the cloud network, on the virtual machines.
[root#master ~]# kubectl get node
NAME STATUS AGE
192.168.2.3 Ready 9d
After a lot of fuss i managed to set up the kubernetes dashboard UI on my master.
[root#master ~]# kubectl describe pod --namespace=kube-system
Name: kubernetes-dashboard-3791223240-8jvs8
Namespace: kube-system
Node: 192.168.2.3/192.168.2.3
Start Time: Thu, 07 Sep 2017 10:37:31 +0200
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=3791223240
Status: Running
IP: 172.16.43.2
Controllers: ReplicaSet/kubernetes-dashboard-3791223240
Containers:
kubernetes-dashboard:
Container ID: docker://8fddde282e41d25c59f51a5a4687c73e79e37828c4f7e960c1bf4a612966420b
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-amd64#sha256:2c4421ed80358a0ee97b44357b6cd6dc09be6ccc27dfe9d50c9bfc39a760e5fe
Port: 9090/TCP
Args:
--apiserver-host=http://192.168.2.2:8080
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 100Mi
State: Running
Started: Fri, 08 Sep 2017 10:54:46 +0200
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 07 Sep 2017 10:37:32 +0200
Finished: Fri, 08 Sep 2017 10:54:44 +0200
Ready: True
Restart Count: 1
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 32m 3 {kubelet 192.168.2.3} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
1d 32m 2 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3" already present on machine
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Created Created container with docker id 8fddde282e41; Security:[seccomp=unconfined]
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Started Started container with docker id 8fddde282e41
also
[root#master ~]# kubectl cluster-info
Kubernetes master is running at http://localhost:8080
kubernetes-dashboard is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Now, when i tried connecting to the dashboard (i tried accessing the dashbord via the browser on windows virtual machine in the same cloud network) using the adress:
https://192.168.218.2:6443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
I am getting the "unauthorized". I believe it proves that the dashboard is indeed running under this address, but i need to set up some way of accessing it?
What i want to achieve in the long term:
i want to enable connecting to the dashboard using the login/password (later, when i learn a bit more, i will think about authenticating by certs or somehting more safe than password) from the outside of the cloud network. For now, connecting to the dashboard at all would do.
I know there are threads about authenticating, but most of them are mentioning something like:
Basic authentication is enabled by passing the
--basic-auth-file=SOMEFILE option to API server
And this is the part i cannot cope with - i have no idea how to pass options to API server.
On the atomic host the api-server,kube-controller-manager and kube-scheduler are running in containers, so I get into the api-server container with command:
docker exec -it kube-apiserver.service bash
I saw few times that i should edit .json file in /etc/kubernetes/manifest directory, but unfortunately there is no such file (or even a directory).
I apologize if my problem is too trivial or not described well enough, but im new to (both) IT world and the stackoverflow.
I would love to provide more info, but I am afraid I would end up including lots of useless information, so i decided to wait for your instructions in that regard.
Check out wiki pages of kubernetes dashboard they describe how to get access to dashboard and how to authenticate to it. For quick access you can run:
kubectl proxy
And then go to following address:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
You'll see two options, one of them is uploading your ~/.kube/config file and the other one is using a token. You can get a token by running following command:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-account-token | head -n 1 | awk '{print $1}')
Now just copy and paste the long token string into dashboard prompt and you're done.
Related
If I understand right from Apply Pod Security Standards at the Cluster Level, in order to have a PSS (Pod Security Standard) as default for the whole cluster I need to create an AdmissionConfiguration in a file that the API server needs to consume during cluster creation.
I don't see any way to configure / provide the AdmissionConfiguration at CreateCluster , also I'm not sure how to provide this AdmissionConfiguration in a managed EKS node.
From the tutorials that use KinD or minikube it seems that the AdmissionConfiguration must be in a file that is referenced in the cluster-config.yaml, but if I'm not mistaken the EKS API server is managed and does not allow to change or even see this file.
The GitHub issue aws/container-roadmap Allow Access to AdmissionConfiguration seems to suggest that currently there is no possibility of providing AdmissionConfiguration at creation, but on the other hand aws-eks-best-practices says These exemptions are applied statically in the PSA admission controller configuration as part of the API server configuration
so, is there a way to provide PodSecurityConfiguration for the whole cluster in EKS? or I'm forced to just use per-namespace labels?
See also Enforce Pod Security Standards by Configuration the Built-in Admission Controller and EKS Best practices PSS and PSA
I don't think there is any way currently in EKS to provide configuration for the built-in PSA controller (Pod Security Admission controller).
But if you want to implement a cluster-wide default for PSS (Pod Security Standards) you can do that by installing the the official pod-security-webhook as a Dynamic Admission Controller in EKS.
git clone https://github.com/kubernetes/pod-security-admission
cd pod-security-admission/webhook
make certs
kubectl apply -k .
The default podsecurityconfiguration.yaml in pod-security-admission/webhook/manifests/020-configmap.yaml allows EVERYTHING so you should edit it and write something like
apiVersion: v1
kind: ConfigMap
metadata:
name: pod-security-webhook
namespace: pod-security-webhook
data:
podsecurityconfiguration.yaml: |
apiVersion: pod-security.admission.config.k8s.io/v1beta1
kind: PodSecurityConfiguration
defaults:
enforce: "restricted"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
# Array of authenticated usernames to exempt.
usernames: []
# Array of runtime class names to exempt.
runtimeClasses: []
# Array of namespaces to exempt.
namespaces: ["policy-test2"]
then
kubectl apply -k .
kubectl -n pod-security-webhook rollout restart deployment/pod-security-webhook # otherwise the pods won't reread the configuration changes
After those changes you can verify that the default forbids privileged pods with:
kubectl --context aihub-eks-terraform create ns policy-test1
kubectl --context aihub-eks-terraform -n policy-test1 run --image=ecerulm/ubuntu-tools:latest --rm -ti rubelagu-$RANDOM --privileged
Error from server (Forbidden): admission webhook "pod-security-webhook.kubernetes.io" denied the request: pods "rubelagu-32081" is forbidden: violates PodSecurity "restricted:latest": privileged (container "rubelagu-32081" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "rubelagu-32081" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "rubelagu-32081" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "rubelagu-32081" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "rubelagu-32081" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Note: that you get the error forbidding privileged pods even when the namespace policy-test1 has no label pod-security.kubernetes.io/enforce, so you know that this rule comes from the pod-security-webhook that we just installed and configured.
Now if you want to create a pod you will be forced to create in a way that complies with the restricted PSS, by specifying runAsNonRoot, seccompProfile.type and capabilities and For example:
apiVersion: v1
kind: Pod
metadata:
name: test-1
spec:
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: test
image: ecerulm/ubuntu-tools:latest
imagePullPolicy: Always
command: ["/bin/bash", "-c", "--", "sleep 900"]
securityContext:
privileged: false
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
I'm trying to create a minimal cluster with 1 node and 1 GPU/node. My command:
gcloud container clusters create cluster-gpu --num-nodes=1 --zone=us-central1-a --machine-type="n1-highmem-2" --accelerator="type=nvidia-tesla-k80,count=1" --scopes="gke-default,storage-rw"
creates the cluster. Now when the following pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: gke-training-pod-gpu
spec:
containers:
- name: my-custom-container
image: gcr.io/.../object-classification:gpu
resources:
limits:
nvidia.com/gpu: 1
is applied to my cluster, I can see in the GKE dashboard that the gke-training-pod-gpu pod is never created. When I do the same as above, only replacing num-nodes=1 by num-nodes=2, this time I get the following error:
ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Insufficient regional quota to satisfy request: resource "NVIDIA_K80_GPUS": request requires '2.0' and is short '1.0'. project has a quota of '1.0' with '1.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=...
Is there any way to use a GPU when the quota is 1?
EDIT:
when pod has been created with kubectl apply command, a kubectl describe pod gke-training-pod-gpu command shows following event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s (x2 over 48s) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Looks like you need to install the NVIDIA CPU device driver on your worker node(s).
Running
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
should do the trick.
The best solution as I see it is to request a quota increase in the IAM & Admin Quotas page.
As for the reason this is happening, I can only imagine that both the node and the pod are requesting GPUs, but only the node is getting it because of the capped quota.
I am trying to run a Redis cluster in Kubernetes in DigitalOcean.
As a poc, I simply tried running an example I found online (https://github.com/sanderploegsma/redis-cluster/blob/master/redis-cluster.yml), which is able to spin up the pods appropriately when running locally using minikube.
However, when running it on Digital Ocean, I always get the following error:
Warning FailedScheduling 3s (x8 over 17s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 4 times)
Given that I am not changing anything, I am not sure why this would not work. Does anyone have any suggestions?
EDIT: some additional info
$ kubectl describe pvc
Name: data-redis-cluster-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 3m19s (x3420 over 14h) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: <none>
EDIT: setting the default storage class partially resolved the problem!
However, the node is now not able to find available volumes to bind:
kubectl describe pvc:
Name: data-redis-cluster-0
Namespace: default
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 12m (x9 over 13m) persistentvolume-controller waiting for first consumer to be created before binding
Normal WaitForFirstConsumer 3m19s (x26 over 9m34s) persistentvolume-controller waiting for first consumer to be created before binding
kubectl describe pod redis-cluster-0
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16m (x25 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't find available persistent volumes to bind.
kubectl describe sc
Name: local-storage
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/no-provisioner
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
kubernetes manager pod logs:
I1028 15:30:56.154131 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Claim data-redis-cluster-0 Pod redis-cluster-0 in StatefulSet redis-cluster success
I1028 15:30:56.166649 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588816", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
I1028 15:30:56.220464 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Pod redis-cluster-0 in StatefulSet redis-cluster successful
I1028 15:30:57.004631 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588825", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
This:
no storage class is set
And an empty output for kubectl describe sc means that there's no storage class.
I recommend installing the CSI-driver for Digital Ocean. That will create a do-block-storage class using the Kubernetes CSI interface.
Another option is to use local storage. Using a local storage class:
$ cat <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF | kubectl apply -f -
Then for either case you may need to set it as a default storage class if you don't specify storageClassName in your PVC:
$ kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
or
$ kubectl patch storageclass do-block-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
It is a statefulSet using PersistentVolumeClaims
You need to configure a default storageClass in your cluster so that the PersistentVolumeClaim can take the storage from there.
In minikube one is already available so it succeeds without error:
C02W84XMHTD5:ucp iahmad$ kubectl get sc --all-namespaces
NAME PROVISIONER AGE
standard (default) k8s.io/minikube-hostpath 7d
I have set up a single node K8S cluster using kubeadm by following the instructions here:
The cluster is up and all system pods are running fine:
[root#umeshworkstation hostpath-provisioner]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-etcd-n988r 1/1 Running 10 6h
calico-node-n1wmk 2/2 Running 10 6h
calico-policy-controller-1777954159-bd8rn 1/1 Running 0 6h
etcd-umeshworkstation 1/1 Running 1 6h
kube-apiserver-umeshworkstation 1/1 Running 1 6h
kube-controller-manager-umeshworkstation 1/1 Running 1 6h
kube-dns-3913472980-2ptjj 0/3 Pending 0 6h
kube-proxy-1d84l 1/1 Running 1 6h
kube-scheduler-umeshworkstation 1/1 Running 1 6h
I then downloaded Hostpath external provisioner code from kubernetes-incubator and built it locally on the same node. The docker image for provisioner built got successfully and I could even instantiate the provisioner pod using pod.yaml from same location. The pod is running fine:
[root#umeshworkstation hostpath-provisioner]# kubectl describe pod hostpath-provisioner
Name: hostpath-provisioner
Namespace: default
Node: umeshworkstation/172.17.24.123
Start Time: Tue, 09 May 2017 23:44:41 -0400
Labels: <none>
Annotations: <none>
Status: Running
IP: 192.168.8.65
Controllers: <none>
Containers:
hostpath-provisioner:
Container ID: docker://c600cfa7a2f5f958ad24e83372a1276a91b41cb67773b9605af4a0ae021ec914
Image: hostpath-provisioner:latest
Image ID: docker://sha256:f6def41ba7c096701c65bf0c0aba6ff31e030573e1a900e378432491ecc5c556
Port:
State: Running
Started: Tue, 09 May 2017 23:44:45 -0400
Ready: True
Restart Count: 0
Environment:
NODE_NAME: (v1:spec.nodeName)
Mounts:
/tmp/hostpath-provisioner from pv-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7wwvj (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
pv-volume:
Type: HostPath (bare host directory volume)
Path: /tmp/hostpath-provisioner
default-token-7wwvj:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7wwvj
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.alpha.kubernetes.io/notReady=:Exists:NoExecute for 300s
node.alpha.kubernetes.io/unreachable=:Exists:NoExecute for 300s
Events: <none>
I then created the storage class as per the instructions of project home, and storage class is created fine:
[root#umeshworkstation hostpath-provisioner]# kubectl describe sc example-hostpath
Name: example-hostpath
IsDefaultClass: No
Annotations: <none>
Provisioner: example.com/hostpath
Parameters: <none>
Events: <none>
The next step was to create a PVC using claim.yaml from same location, but PVC is remaining in Pending state, and describe shows its not able to locate the provisioner example.com/hostpath:
[root#umeshworkstation hostpath-provisioner]# kubectl describe pvc
Name: hostpath
Namespace: default
StorageClass: example-hostpath
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-class=example-hostpath
volume.beta.kubernetes.io/storage-provisioner=example.com/hostpath
Capacity:
Access Modes:
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2h 11s 874 persistentvolume-controller Normal ExternalProvisioning cannot find provisioner "example.com/hostpath", expecting that a volume for the claim is provisioned either manually or via external software
The PVC has remained forever in Pending state because of this.
Am I missing something?
I have figured out the issue.. Thanks #jaxxstorm for helping me move towards in the right direction.
When I inspected provisioner pod logs I could see that its unable to access the API server to list StorageClass, PVC or PVs as it was created with default service account, which does not have the privileges to access these APIs.
The solution was to create a separate service account, pod security policy, cluster role and cluster role binding, as explained for NFS external provisioner here
After this I could see my PVC getting binded to the volume and hostpath showing the mount
[root#umeshworkstation hostpath-provisioner]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
hostpath Bound pvc-8179c8d6-36db-11e7-9ed4-005056a21a50 1Mi RWX example-hostpath 1m
[root#umeshworkstation hostpath-provisioner]# ls /tmp/hostpath-provisioner/
pvc-8179c8d6-36db-11e7-9ed4-005056a21a50
I appear to have some problem with my installation of OpenShift Origin.
When I get endpoints for the router, I get the following:
oc get endpoints --namespace=default --selector=router
NAME ENDPOINTS AGE
router-west <none> 21m
Obviously the router should have at least one endpoint.
Im trying to follow the troubleshooting guide on https://docs.openshift.com/enterprise/3.1/admin_guide/sdn_troubleshooting.html#debugging-the-router however it does not provide assistance in the situation where the router has not endpoints.
When I get my list of nodes, I get:
oc get nodes
NAME LABELS STATUS AGE
openshift.hughestech.space kubernetes.io/hostname=openshift.mydomain.com NotReady 38d
When I describe the node, I get the following:
oc describe node openshift.mydomain.com
Name: openshift.mydomain.com
Labels: kubernetes.io/hostname=openshift.mydomain.com
CreationTimestamp: Sat, 06 Feb 2016 21:44:23 +0100
Phase:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
──── ────── ───────────────── ────────────────── ────── ───────
Ready Unknown Fri, 04 Mar 2016 18:50:39 +0100 Fri, 04 Mar 2016 18:51:21 +0100 NodeStatusUnknown Kubelet stopped posting node status.
Addresses: 88.198.37.183,88.198.37.183
Capacity:
memory: 24515560Ki
pods: 40
cpu: 8
System Info:
Machine ID: bafaea4f3c4c4cf6a632047c1d14db1a
System UUID: 00000000-0000-0000-0000-002421DDE3D7
Boot ID: f9febe14-ec61-41d5-b7c3-db2e42f9b452
Kernel Version: 3.10.0-327.4.5.el7.x86_64
OS Image: Red Hat Enterprise Linux
Container Runtime Version: docker://1.8.2-el7
Kubelet Version: v1.1.0-origin-1107-g4c8e6f4
Kube-Proxy Version: v1.1.0-origin-1107-g4c8e6f4
ExternalID: openshift.mydomain.com
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
───────── ──── ──────────── ────────── ─────────────── ─────────────
Allocated resources:
(Total limits may be over 100%, i.e., overcommitted. More info: http://releases.k8s.io/HEAD/docs/user-guide/compute-resources.md)
CPU Requests CPU Limits Memory Requests Memory Limits
──────────── ────────── ─────────────── ─────────────
0 (0%) 0 (0%) 0 (0%) 0 (0%)
No events.
Where have I gone wrong? What do I need to do?
Thanks
Restart the node service and see if that makes a difference in oc get nodes output.
systemctl restart origin-node
Unless your node is running you can cannot make a running router pod and resulting in no endpoints.