Unable to request cluster with one GPU on GKE

Unable to request cluster with one GPU on GKE - gpu

I'm trying to create a minimal cluster with 1 node and 1 GPU/node. My command:
gcloud container clusters create cluster-gpu --num-nodes=1 --zone=us-central1-a --machine-type="n1-highmem-2" --accelerator="type=nvidia-tesla-k80,count=1" --scopes="gke-default,storage-rw"
creates the cluster. Now when the following pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: gke-training-pod-gpu
spec:
containers:
- name: my-custom-container
image: gcr.io/.../object-classification:gpu
resources:
limits:
nvidia.com/gpu: 1
is applied to my cluster, I can see in the GKE dashboard that the gke-training-pod-gpu pod is never created. When I do the same as above, only replacing num-nodes=1 by num-nodes=2, this time I get the following error:
ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Insufficient regional quota to satisfy request: resource "NVIDIA_K80_GPUS": request requires '2.0' and is short '1.0'. project has a quota of '1.0' with '1.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=...
Is there any way to use a GPU when the quota is 1?
EDIT:
when pod has been created with kubectl apply command, a kubectl describe pod gke-training-pod-gpu command shows following event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s (x2 over 48s) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.

Looks like you need to install the NVIDIA CPU device driver on your worker node(s).
Running
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
should do the trick.

The best solution as I see it is to request a quota increase in the IAM & Admin Quotas page.
As for the reason this is happening, I can only imagine that both the node and the pod are requesting GPUs, but only the node is getting it because of the capped quota.

Related

How can I configure the AdmissionConfiguration > PodSecurity > PodSecurityConfiguration in an EKS cluster?

If I understand right from Apply Pod Security Standards at the Cluster Level, in order to have a PSS (Pod Security Standard) as default for the whole cluster I need to create an AdmissionConfiguration in a file that the API server needs to consume during cluster creation.
I don't see any way to configure / provide the AdmissionConfiguration at CreateCluster , also I'm not sure how to provide this AdmissionConfiguration in a managed EKS node.
From the tutorials that use KinD or minikube it seems that the AdmissionConfiguration must be in a file that is referenced in the cluster-config.yaml, but if I'm not mistaken the EKS API server is managed and does not allow to change or even see this file.
The GitHub issue aws/container-roadmap Allow Access to AdmissionConfiguration seems to suggest that currently there is no possibility of providing AdmissionConfiguration at creation, but on the other hand aws-eks-best-practices says These exemptions are applied statically in the PSA admission controller configuration as part of the API server configuration
so, is there a way to provide PodSecurityConfiguration for the whole cluster in EKS? or I'm forced to just use per-namespace labels?
See also Enforce Pod Security Standards by Configuration the Built-in Admission Controller and EKS Best practices PSS and PSA

I don't think there is any way currently in EKS to provide configuration for the built-in PSA controller (Pod Security Admission controller).
But if you want to implement a cluster-wide default for PSS (Pod Security Standards) you can do that by installing the the official pod-security-webhook as a Dynamic Admission Controller in EKS.
git clone https://github.com/kubernetes/pod-security-admission
cd pod-security-admission/webhook
make certs
kubectl apply -k .
The default podsecurityconfiguration.yaml in pod-security-admission/webhook/manifests/020-configmap.yaml allows EVERYTHING so you should edit it and write something like
apiVersion: v1
kind: ConfigMap
metadata:
name: pod-security-webhook
namespace: pod-security-webhook
data:
podsecurityconfiguration.yaml: |
apiVersion: pod-security.admission.config.k8s.io/v1beta1
kind: PodSecurityConfiguration
defaults:
enforce: "restricted"
enforce-version: "latest"
audit: "restricted"
audit-version: "latest"
warn: "restricted"
warn-version: "latest"
exemptions:
# Array of authenticated usernames to exempt.
usernames: []
# Array of runtime class names to exempt.
runtimeClasses: []
# Array of namespaces to exempt.
namespaces: ["policy-test2"]
then
kubectl apply -k .
kubectl -n pod-security-webhook rollout restart deployment/pod-security-webhook # otherwise the pods won't reread the configuration changes
After those changes you can verify that the default forbids privileged pods with:
kubectl --context aihub-eks-terraform create ns policy-test1
kubectl --context aihub-eks-terraform -n policy-test1 run --image=ecerulm/ubuntu-tools:latest --rm -ti rubelagu-$RANDOM --privileged
Error from server (Forbidden): admission webhook "pod-security-webhook.kubernetes.io" denied the request: pods "rubelagu-32081" is forbidden: violates PodSecurity "restricted:latest": privileged (container "rubelagu-32081" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "rubelagu-32081" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "rubelagu-32081" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "rubelagu-32081" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "rubelagu-32081" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Note: that you get the error forbidding privileged pods even when the namespace policy-test1 has no label pod-security.kubernetes.io/enforce, so you know that this rule comes from the pod-security-webhook that we just installed and configured.
Now if you want to create a pod you will be forced to create in a way that complies with the restricted PSS, by specifying runAsNonRoot, seccompProfile.type and capabilities and For example:
apiVersion: v1
kind: Pod
metadata:
name: test-1
spec:
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: test
image: ecerulm/ubuntu-tools:latest
imagePullPolicy: Always
command: ["/bin/bash", "-c", "--", "sleep 900"]
securityContext:
privileged: false
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL

Debugging istio rate limiting handler

I'm trying to apply rate limiting on some of our internal services (inside the mesh).
I used the example from the docs and generated redis rate limiting configurations that include a (redis) handler, quota instance, quota spec, quota spec binding and rule to apply the handler.
This redis handler:
apiVersion: config.istio.io/v1alpha2
kind: handler
metadata:
name: redishandler
namespace: istio-system
spec:
compiledAdapter: redisquota
params:
redisServerUrl: <REDIS>:6379
connectionPoolSize: 10
quotas:
- name: requestcountquota.instance.istio-system
maxAmount: 10
validDuration: 100s
rateLimitAlgorithm: FIXED_WINDOW
overrides:
- dimensions:
destination: s1
maxAmount: 1
- dimensions:
destination: s3
maxAmount: 1
- dimensions:
destination: s2
maxAmount: 1
The quota instance (I'm only interested in limiting by destination at the moment):
apiVersion: config.istio.io/v1alpha2
kind: instance
metadata:
name: requestcountquota
namespace: istio-system
spec:
compiledTemplate: quota
params:
dimensions:
destination: destination.labels["app"] | destination.service.host | "unknown"
A quota spec, charging 1 per request if I understand correctly:
apiVersion: config.istio.io/v1alpha2
kind: QuotaSpec
metadata:
name: request-count
namespace: istio-system
spec:
rules:
- quotas:
- charge: 1
quota: requestcountquota
A quota binding spec that all participating services pre-fetch. I also tried with service: "*" which also did nothing.
apiVersion: config.istio.io/v1alpha2
kind: QuotaSpecBinding
metadata:
name: request-count
namespace: istio-system
spec:
quotaSpecs:
- name: request-count
namespace: istio-system
services:
- name: s2
namespace: default
- name: s3
namespace: default
- name: s1
namespace: default
# - service: '*' # Uncomment this to bind *all* services to request-count
A rule to apply the handler. Currently on all occasions (tried with matches but didn't change anything as well):
apiVersion: config.istio.io/v1alpha2
kind: rule
metadata:
name: quota
namespace: istio-system
spec:
actions:
- handler: redishandler
instances:
- requestcountquota
The VirtualService definitions are pretty similar for all participants:
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: s1
spec:
hosts:
- s1
http:
- route:
- destination:
host: s1
The problem is nothing really happens and no rate limiting takes place. I tested with curl from pods inside the mesh. The redis instance is empty (no keys on db 0, which I assume is what the rate limiting would use) so I know it can't practically rate-limit anything.
The handler seems to be configured properly (how can I make sure?) because I had some errors in it which were reported in mixer (policy). There are still some errors but none which I associate to this problem or the configuration. The only line in which redis handler is mentioned is this:
2019-12-17T13:44:22.958041Z info adapters adapter closed all scheduled daemons and workers {"adapter": "redishandler.istio-system"}
But its unclear if its a problem or not. I assume its not.
These are the rest of the lines from the reload once I deploy:
2019-12-17T13:44:22.601644Z info Built new config.Snapshot: id='43'
2019-12-17T13:44:22.601866Z info adapters getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.601881Z warn Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2019-12-17T13:44:22.602718Z info adapters Waiting for kubernetes cache sync... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903844Z info adapters Cache sync successful. {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903878Z info adapters getting kubeconfig from: "" {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.903882Z warn Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2019-12-17T13:44:22.904808Z info Setting up event handlers
2019-12-17T13:44:22.904939Z info Starting Secrets controller
2019-12-17T13:44:22.904991Z info Waiting for informer caches to sync
2019-12-17T13:44:22.957893Z info Cleaning up handler table, with config ID:42
2019-12-17T13:44:22.957924Z info adapters deleted remote controller {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.957999Z info adapters adapter closed all scheduled daemons and workers {"adapter": "prometheus.istio-system"}
2019-12-17T13:44:22.958041Z info adapters adapter closed all scheduled daemons and workers {"adapter": "redishandler.istio-system"}
2019-12-17T13:44:22.958065Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958050Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958096Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:22.958182Z info adapters shutting down daemon... {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:44:23.958109Z info adapters adapter closed all scheduled daemons and workers {"adapter": "kubernetesenv.istio-system"}
2019-12-17T13:55:21.042131Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-12-17T14:14:00.265722Z info transport: loopyWriter.run returning. connection error: desc = "transport is closing"
I'm using the demo profile with disablePolicyChecks: false to enable rate limiting. This is on istio 1.4.0, deployed on EKS.
I also tried memquota (this is our staging environment) with low limits and nothing seems to work. I never got a 429 no matter how much I went over the rate limit configured.
I don't know how to debug this and see where the configuration is wrong causing it to do nothing.
Any help is appreciated.

I too spent hours trying to decipher the documentation and get a sample working.
According to the documentation, they recommended that we enable policy checks:
https://istio.io/docs/tasks/policy-enforcement/rate-limiting/
However when that did not work, I did an "istioctl profile dump", searched for policy, and tried several settings.
I used Helm install and passed the following and then was able to get the described behaviour:
--set global.disablePolicyChecks=false \
--set values.pilot.policy.enabled=true \ ===> this made it work, but it's not in the docs.

How does Kubernetes Horizontal Pod Autoscaler calculate CPU Utilization for Multi Container Pods?

Question 1.)
Given the scenario a multi-container pod, where all containers have a defined CPU request:
How would Kubernetes Horizontal Pod Autoscaler calculate CPU Utilization for Multi Container pods?
Does it average them? (((500m cpu req + 50m cpu req) /2) * X% HPA target cpu utilization
Does it add them? ((500m cpu req + 50m cpu req) * X% HPA target cpu utilization
Does it track them individually? (500m cpu req * X% HPA target cpu utilization = target #1, 50m cpu req * X% HPA target cpu utilization = target #2.)
Question 2.)
Given the scenario of a multi-container pod, where 1 container has a defined CPU request and a blank CPU request for the other containers:
How would Kubernetes Horizontal Pod Autoscaler calculate CPU Utilization for Multi Container pods?
Does it work as if you only had a 1 container pod?
Question 3.)
Do the answers to questions 1 and 2 change based on the HPA API version? I noticed stable/nginx-ingress helm chart, chart version 1.10.2, deploys an HPA for me with these specs:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
(I noticed apiVersion: autoscaling/v2beta2 now exists)
Background Info:
I recently had an issue with unexpected wild scaling / constantly going back and forth between min and max pods after adding a sidecar(2nd container) to an nginx ingress controller deployment (which is usually a pod with a single container). In my case, it was an oauth2 proxy, although I image istio sidecar container folks might run into this sort of problem all the time as well.
apiVersion: apps/v1
kind: Deployment
spec:
replicas: 3
template:
spec:
containers:
- name: nginx-ingress-controller #(primary-container)
resources:
requests:
cpu: 500m #baseline light load usage in my env
memory: 2Gi #according to kubectl top pods
limits:
memory: 8Gi #(oom kill pod if this high, because somethings wrong)
- name: oauth2-proxy #(newly-added-2nd-sidecar-container)
resources:
requests:
cpu: 50m
memory: 50Mi
limits:
memory: 4Gi
I have an HPA (apiVersion: autoscaling/v1) with:
min 3 replicas (to preserve HA during rolling updates)
targetCPUUtilizationPercentage = 150%
It occurred to me that my misconfiguration leads to unexpected wild scaling was caused by 2 issues:
I don't actually understand how HPAs work when the pod has multiple containers
I don't know how to dig deep to get metrics of what's going on.
To address the first issue: I brainstormed my understanding of how it works in the single container scenario (and then realized I don't know the multi-container scenario so I decided to ask this question)
This is my understanding of how HPA (autoscaling/v1) works when I have 1 container (temporarily ignore the 2nd container in the above deployment spec):
The HPA would spawn more replicas when the CPU utilization average of all pods shifted from my normal expected load of 500m or less to 750m (150% x 500m request)
To address the 2nd issue: I found out how to dig to see concrete numeric value-based metrics vs relative percentage-based metrics to help figure out what's happening behind the scenes:
bash# kubectl describe horizontalpodautoscaler nginx-ingress-controller -n=ingress | grep Metrics: -A 1
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 5% (56m) / 100%
(Note: kubectl top pods -n=ingress, showed cpu usage of the 5 replicas as 36m, 34m, 88m, 36m, 91m, so that 57m current which ~matches 56m current)
Also now it's a basic proportions Math Problem that allows solving for target static value:
(5% / 56m) = (100% / x m) --> x = 56 * 100 / 5 = 1120m target cpu
(Note: this HPA isn't associated with the deployment mentioned above, that's why the numbers are off.)

Basing on stackoverflow community member answer in other case
"HPA calculates pod cpu utilization as total cpu usage of all containers in pod divided by total request. I don't think that's specified in docs anywhere, but the relevant code is here"
You have got more informations,with examples in the link above.
Basing on documentation
Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with beta support, on some other, application-provided metrics).
So basically:
apiVersion autoscaling/v1 HPA base on cpu.
apiVersion autoscaling/v2beta2 base on cpu,memory,custom metrics.
More informations here

For question 2 if we look at source code we can see that it looks all containers individually and returns if container doesn't have request.
func calculatePodRequests(pods []*v1.Pod, resource v1.ResourceName) (map[string]int64, error) {
requests := make(map[string]int64, len(pods))
for _, pod := range pods {
podSum := int64(0)
for _, container := range pod.Spec.Containers {
if containerRequest, ok := container.Resources.Requests[resource]; ok {
podSum += containerRequest.MilliValue()
} else {
return nil, fmt.Errorf("missing request for %s", resource)
}
}
requests[pod.Name] = podSum
}
return requests, nil
}

DigitalOcean pod has unbound immediate PersistentVolumeClaims

I am trying to run a Redis cluster in Kubernetes in DigitalOcean.
As a poc, I simply tried running an example I found online (https://github.com/sanderploegsma/redis-cluster/blob/master/redis-cluster.yml), which is able to spin up the pods appropriately when running locally using minikube.
However, when running it on Digital Ocean, I always get the following error:
Warning FailedScheduling 3s (x8 over 17s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 4 times)
Given that I am not changing anything, I am not sure why this would not work. Does anyone have any suggestions?
EDIT: some additional info
$ kubectl describe pvc
Name: data-redis-cluster-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 3m19s (x3420 over 14h) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
Mounted By: <none>
EDIT: setting the default storage class partially resolved the problem!
However, the node is now not able to find available volumes to bind:
kubectl describe pvc:
Name: data-redis-cluster-0
Namespace: default
StorageClass: local-storage
Status: Pending
Volume:
Labels: app=redis-cluster
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 12m (x9 over 13m) persistentvolume-controller waiting for first consumer to be created before binding
Normal WaitForFirstConsumer 3m19s (x26 over 9m34s) persistentvolume-controller waiting for first consumer to be created before binding
kubectl describe pod redis-cluster-0
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16m (x25 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 4 node(s) didn't find available persistent volumes to bind.
kubectl describe sc
Name: local-storage
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/no-provisioner
Parameters: <none>
AllowVolumeExpansion: <unset>
MountOptions: <none>
ReclaimPolicy: Delete
VolumeBindingMode: WaitForFirstConsumer
Events: <none>
kubernetes manager pod logs:
I1028 15:30:56.154131 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Claim data-redis-cluster-0 Pod redis-cluster-0 in StatefulSet redis-cluster success
I1028 15:30:56.166649 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588816", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding
I1028 15:30:56.220464 1 event.go:221] Event(v1.ObjectReference{Kind:"StatefulSet", Namespace:"default", Name:"redis-cluster", UID:"7528483e-dac6-11e8-871f-2e55450d570e", APIVersion:"apps/v1", ResourceVersion:"2588806", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' create Pod redis-cluster-0 in StatefulSet redis-cluster successful
I1028 15:30:57.004631 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"data-redis-cluster-0", UID:"76746506-dac6-11e8-871f-2e55450d570e", APIVersion:"v1", ResourceVersion:"2588825", FieldPath:""}): type: 'Normal' reason: 'WaitForFirstConsumer' waiting for first consumer to be created before binding

This:
no storage class is set
And an empty output for kubectl describe sc means that there's no storage class.
I recommend installing the CSI-driver for Digital Ocean. That will create a do-block-storage class using the Kubernetes CSI interface.
Another option is to use local storage. Using a local storage class:
$ cat <<EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
EOF | kubectl apply -f -
Then for either case you may need to set it as a default storage class if you don't specify storageClassName in your PVC:
$ kubectl patch storageclass local-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
or
$ kubectl patch storageclass do-block-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

It is a statefulSet using PersistentVolumeClaims
You need to configure a default storageClass in your cluster so that the PersistentVolumeClaim can take the storage from there.
In minikube one is already available so it succeeds without error:
C02W84XMHTD5:ucp iahmad$ kubectl get sc --all-namespaces
NAME PROVISIONER AGE
standard (default) k8s.io/minikube-hostpath 7d

Kubernetes dashboard authentication on atomic host

I am a total newbie in terms of kubernetes/atomic host, so my question may be really trivial or well discussed already - but unfortunately i couldn't find any clues how to achieve my goal - that's why i am here.
I have set up kubernetes cluster on atomic hosts (right now i have just one master and one node). I am working in the cloud network, on the virtual machines.
[root#master ~]# kubectl get node
NAME STATUS AGE
192.168.2.3 Ready 9d
After a lot of fuss i managed to set up the kubernetes dashboard UI on my master.
[root#master ~]# kubectl describe pod --namespace=kube-system
Name: kubernetes-dashboard-3791223240-8jvs8
Namespace: kube-system
Node: 192.168.2.3/192.168.2.3
Start Time: Thu, 07 Sep 2017 10:37:31 +0200
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=3791223240
Status: Running
IP: 172.16.43.2
Controllers: ReplicaSet/kubernetes-dashboard-3791223240
Containers:
kubernetes-dashboard:
Container ID: docker://8fddde282e41d25c59f51a5a4687c73e79e37828c4f7e960c1bf4a612966420b
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-amd64#sha256:2c4421ed80358a0ee97b44357b6cd6dc09be6ccc27dfe9d50c9bfc39a760e5fe
Port: 9090/TCP
Args:
--apiserver-host=http://192.168.2.2:8080
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 100Mi
State: Running
Started: Fri, 08 Sep 2017 10:54:46 +0200
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 07 Sep 2017 10:37:32 +0200
Finished: Fri, 08 Sep 2017 10:54:44 +0200
Ready: True
Restart Count: 1
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 32m 3 {kubelet 192.168.2.3} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
1d 32m 2 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3" already present on machine
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Created Created container with docker id 8fddde282e41; Security:[seccomp=unconfined]
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Started Started container with docker id 8fddde282e41
also
[root#master ~]# kubectl cluster-info
Kubernetes master is running at http://localhost:8080
kubernetes-dashboard is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Now, when i tried connecting to the dashboard (i tried accessing the dashbord via the browser on windows virtual machine in the same cloud network) using the adress:
https://192.168.218.2:6443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
I am getting the "unauthorized". I believe it proves that the dashboard is indeed running under this address, but i need to set up some way of accessing it?
What i want to achieve in the long term:
i want to enable connecting to the dashboard using the login/password (later, when i learn a bit more, i will think about authenticating by certs or somehting more safe than password) from the outside of the cloud network. For now, connecting to the dashboard at all would do.
I know there are threads about authenticating, but most of them are mentioning something like:
Basic authentication is enabled by passing the
--basic-auth-file=SOMEFILE option to API server
And this is the part i cannot cope with - i have no idea how to pass options to API server.
On the atomic host the api-server,kube-controller-manager and kube-scheduler are running in containers, so I get into the api-server container with command:
docker exec -it kube-apiserver.service bash
I saw few times that i should edit .json file in /etc/kubernetes/manifest directory, but unfortunately there is no such file (or even a directory).
I apologize if my problem is too trivial or not described well enough, but im new to (both) IT world and the stackoverflow.
I would love to provide more info, but I am afraid I would end up including lots of useless information, so i decided to wait for your instructions in that regard.

Check out wiki pages of kubernetes dashboard they describe how to get access to dashboard and how to authenticate to it. For quick access you can run:
kubectl proxy
And then go to following address:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
You'll see two options, one of them is uploading your ~/.kube/config file and the other one is using a token. You can get a token by running following command:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-account-token | head -n 1 | awk '{print $1}')
Now just copy and paste the long token string into dashboard prompt and you're done.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Unable to request cluster with one GPU on GKE - gpu

Looks like you need to install the NVIDIA CPU device driver on your worker node(s). Running kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml should do the trick.

The best solution as I see it is to request a quota increase in the IAM & Admin Quotas page. As for the reason this is happening, I can only imagine that both the node and the pod are requesting GPUs, but only the node is getting it because of the capped quota.

Related

How can I configure the AdmissionConfiguration > PodSecurity > PodSecurityConfiguration in an EKS cluster?

Debugging istio rate limiting handler

How does Kubernetes Horizontal Pod Autoscaler calculate CPU Utilization for Multi Container Pods?

DigitalOcean pod has unbound immediate PersistentVolumeClaims

Kubernetes dashboard authentication on atomic host

Categories

Resources