After installing nginx ingress from helm charts, relavent pods are not getting started as image pull is failing due to SSL issue.
Pod describe :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned xxx/xxxxxxx-nginx-ingress-controller-649b45b7c6-26ggh to aks-agentpool-21683929-0
Normal Pulling 16s (x2 over 29s) kubelet, aks-agentpool-21683929-0 Pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.32.0"
Warning Failed 16s (x2 over 29s) kubelet, aks-agentpool-21683929-0 Failed to pull image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.32.0": rpc error: code = Unknown desc = Error response from daemon: Get https://quay.io/v2/: x509: certificate signed by unknown authority
Warning Failed 16s (x2 over 29s) kubelet, aks-agentpool-21683929-0 Error: ErrImagePull
Normal BackOff 3s (x3 over 28s) kubelet, aks-agentpool-21683929-0 Back-off pulling image "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.32.0"
Warning Failed 3s (x3 over 28s) kubelet, aks-agentpool-21683929-0 Error: ImagePullBackOff
Should I trust / ignore the certifcate at K8s API server or in Pods, I couldnt find right info online. Any help would greatly benefit.
Related
Following the super-simple instructions in this link (the helm method) I'm trying to install Kong on an test EKS cluster (in an empty kong namespace) and it's a disaster. The service never gets an external IP (stuck at "pending"). The ingress-controller container inside the main pod fails with:
time="2022-06-15T19:14:52Z" level=info msg="successfully synced configuration to kong."
time="2022-06-15T19:14:54Z" level=error msg="checking config status failed: %!w(*kong.APIError=&{500 An unexpected error occurred})"
time="2022-06-15T19:14:57Z" level=error msg="checking config status failed: %!w(*kong.APIError=&{500 An unexpected error occurred})"
time="2022-06-15T19:15:00Z" level=error msg="checking config status failed: %!w(*kong.APIError=&{500 An unexpected error occurred})"
(the last line repeating every 3sec)
...while the proxy container inside the same pod fails with repeating
2022/06/15 19:47:27 [error] 1110#0: *11302 [lua] api_helpers.lua:511: handle_error(): /usr/local/share/lua/5.1/lapis/application.lua:424: /usr/local/share/lua/5.1/kong/api/routes/health.lua:45: http2 requests not supported yet
I'm not doing any customizing with the values file, I'm just installing as it comes by default. The instructions (just a "helm repo add" and a "helm install") come from the official Kong site, so what is amiss?? Helm is v3.8, K8s is v1.21.
I have a hyperledger fabric network v2.2.0 deployed with 2 peer orgs and an orderer org in a kubernetes cluster. Each org has its own CA server. The CA pod keeps on restarting sometimes. In order to know whether the service of the CA server is reachable or not, I am trying to use the healthz API on port 9443.
I have used the livenessProbe condition in the CA deployment like so:
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 9443
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
After configuring this liveness probe, the pod keeps on restarting with the event Liveness probe failed: HTTP probe failed with status code: 400. Why might this be happening?
HTTP 400 code:
The HTTP 400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (for example, malformed request syntax, invalid request message framing, or deceptive request routing).
This indicates that Kubernetes is sending the data in a way hyperledger is rejecting, but without more information it is hard to say where the problem is. Some quick checks to start with:
Send some GET requests directly to the hyperledger /healthz resource yourself. What do you get? You should get back either a 200 "OK" if everything is functioning, or a 503 "Service Unavailable" with details of which nodes are down (docs).
kubectl describe pod liveness-request. You should see a few lines towards the bottom describing the state of the liveness probe in more detail:
Restart Count: 0
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned example-dc/liveness-request to dcpoz-d-sou-k8swor3
Normal Pulling 4m45s kubelet, dcpoz-d-sou-k8swor3 Pulling image "nginx"
Normal Pulled 4m42s kubelet, dcpoz-d-sou-k8swor3 Successfully pulled image "nginx"
Normal Created 4m42s kubelet, dcpoz-d-sou-k8swor3 Created container liveness
Normal Started 4m42s kubelet, dcpoz-d-sou-k8swor3 Started container liveness
Some other things to investigate:
httpGet options that might be helpful:
scheme – Protocol type HTTP or HTTPS
httpHeaders– Custom headers to set in the request
Have you configured the operations service?
You may need a valid client certificate (if TLS is enabled, and clientAuthRequired is set to true).
I'm trying to create a minimal cluster with 1 node and 1 GPU/node. My command:
gcloud container clusters create cluster-gpu --num-nodes=1 --zone=us-central1-a --machine-type="n1-highmem-2" --accelerator="type=nvidia-tesla-k80,count=1" --scopes="gke-default,storage-rw"
creates the cluster. Now when the following pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: gke-training-pod-gpu
spec:
containers:
- name: my-custom-container
image: gcr.io/.../object-classification:gpu
resources:
limits:
nvidia.com/gpu: 1
is applied to my cluster, I can see in the GKE dashboard that the gke-training-pod-gpu pod is never created. When I do the same as above, only replacing num-nodes=1 by num-nodes=2, this time I get the following error:
ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Insufficient regional quota to satisfy request: resource "NVIDIA_K80_GPUS": request requires '2.0' and is short '1.0'. project has a quota of '1.0' with '1.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=...
Is there any way to use a GPU when the quota is 1?
EDIT:
when pod has been created with kubectl apply command, a kubectl describe pod gke-training-pod-gpu command shows following event:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 48s (x2 over 48s) default-scheduler 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Looks like you need to install the NVIDIA CPU device driver on your worker node(s).
Running
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/cos/daemonset-preloaded.yaml
should do the trick.
The best solution as I see it is to request a quota increase in the IAM & Admin Quotas page.
As for the reason this is happening, I can only imagine that both the node and the pod are requesting GPUs, but only the node is getting it because of the capped quota.
We've been following the guide for automatic sidecar injection in istio-0.5.0 on kubernetes 1.9.2, but have so far been unsuccessful due to certificate issues on the api-server.
When pods are created, the webhook is called, but the api-server rejects the certficate presented by istio-sidecar-injector/inject, stating:
W0205 09:15:27.389473 1 admission.go:257] Failed calling webhook, failing open sidecar-injector.istio.io: failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject: x509: certificate signed by unknown authority
E0205 09:15:27.389501 1 admission.go:258] failed calling admission webhook "sidecar-injector.istio.io": Post https://istio-sidecar-injector.istio-system.svc:443/inject: x509: certificate signed by unknown authority
Our API server has been configured with the following flags:
- --allow-privileged=true
- --kubelet-client-certificate=/etc/kubernetes/pki/admin.pem
- --kubelet-client-key=/etc/kubernetes/pki/admin-key.pem
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --tls-ca-file=/etc/kubernetes/pki/ca.pem
- --tls-cert-file=/etc/kubernetes/pki/kube-apiserver-server.pem
- --tls-private-key-file=/etc/kubernetes/pki/kube-apiserver-server-key.pem
- --secure-port=6443
- --enable-bootstrap-token-auth
- --storage-backend=etcd3
- --service-cluster-ip-range=10.254.0.0/16
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --client-ca-file=/etc/kubernetes/pki/ca.pem
- --insecure-port=8080
- --insecure-bind-address=127.0.0.1
- --admissioncontrol=MutatingAdmissionWebhook,Initializers,NamespaceLifecycle,LimitRanger,ServiceAccount,PersistentVolumeLabel,DefaultStorageClass,ResourceQuota,DefaultTolerationSeconds
- --authorization-mode=RBAC
- --oidc-issuer-url=https://sts.windows.net/[...removed...]/
- --oidc-client-id=spn:[...removed...]
- --oidc-username-claim=upn
- --oidc-groups-claim=groups
- --v=0
- --advertise-address=10.1.1.200
- --etcd-servers=http://etcd-0:2379,http://etcd-1:2379,http://etcd-2:2379
The certificate has been signed by the ca.pem file, which we have given to the api-server via the --tls-ca-file flag, but still no cigar.
Any ideas out there on how we can get the kubernetes API admission controller to trust the certificate presented by the sidecar-injector?
I know its experimental, I am trying to setup use docker-compose to build Spinnaker. I am seeing an error when trying to browse localhost:9000. Its trying to redirect to this page.
http://localhost:8084/auth/redirectto=http%3A%2F%2Flocalhost%3A9000%2F%23%2Finfrastructure
Looks like either its a fiat or gate issue. Tried adding proxy to apache2.
Errors in Fiat:
RetrofitError: unexpected url: front50/serviceAccounts
2017-09-15 19:24:31.642 WARN 1 --- [ont50Service-10]
c.n.s.f.p.internal.Front50Service : [] Falling back to service
account cache. Cause: unexpected url: front50/serviceAccounts
2017-09-15 19:24:31.645 WARN 1 --- [ecutionAction-1]
c.n.s.fiat.roles.UserRolesSyncer : [] User permission sync
failed. Server status is DOWN. Trying again in 10000 ms. Cause:
(Provider: DefaultServiceAccountProvider) retrofit.RetrofitError:
unexpected url: front50/serviceAccounts
Errors in gate:
2017-09-15 19:18:19.386 ERROR 1 --- [ost-startStop-1]
o.s.b.b.PropertiesConfigurationFactory : Properties configuration
failed validation
2017-09-15 19:18:19.394 ERROR 1 --- [ost-startStop-1]
o.s.b.b.PropertiesConfigurationFactory : Field error in object
'target' on field 'services[ORCA_HOST]': rejected value [orca]; codes
For errors in fiat, you can configure the environment variable in docker compose to
environment:
- "SERVICES_CLOUDDRIVER_BASEURL=http://clouddriver:7002"
- "SERVICES_FRONT50_BASEURL=http://front50:8080"
Update: This workaround work for Gate
environment:
- "services.clouddriver.host=clouddriver"
- "services.echo.host=echo"
- "services.front50.host=front50"
- "services.igor.host=igor"
- "services.orca.host=orca"
- "services.rosco.host=rosco"