Scaling AKS from 1 to 2 nodes: cannot create pods on Node0 and exec (or view logs) on Node1 - azure-container-service

After scaling AKS from 1 to 2 nodes, I got 2 issues.
Node0 cannot create new pods. They got stucked at ContainerCreating status.
Failed create pod sandbox. Error syncing pod.
Node1 cannot EXEC and view LOGS:
the server has asked for the client to provide credentials.
Kubernetes version: 1.8.7
Please advise, thanks!!
UPDATE Jun 14, 2018
Issue 1: RESOLVED by restarting the node from Azure Portal and kubectl drain.
Issue 2: NOT RESOLVED.
Add some screenshots for Issue 2 (cannot EXEC and LOGS)
Logs screenshot
Exec screenshot

Related

Digital Ocean droplet & gitlab runner problem

I am recently working on GitLab CI/CD and I want to set up a runner on digital ocean droplet however I get the following error:
$ docker network create web
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Cleaning up project directory and file based variables
00:00
ERROR: Job failed: exit code 1
how should I avoid this problem, consider that the docker is up and running on the droplet ubuntu with 8 GB ram
it can be only one of this reason:
a) gitlab-runner user is not in docker group
id gitlab-runner
should show something like
uid=998(gitlab-runner) gid=998(gitlab-runner) groups=998(gitlab-runner),1001(docker)
b) docker service is not running in droplet

How connect to MSK cluster from EKS cluster

I am having difficulties connecting to my MSK cluster from my EKS cluster even though both clusters share the same VPC and the same subnets.
The security group used by the MSK cluster has the following inbound rules
type
protocol
port range
source
all traffic
all
all
custom
SG_ID
all traffic
all
all
anywhere ipv4
0.0.0.0/0
Where SG_ID is the EKS' Cluster security group.
The one labeled: EKS created security group applied...
In the EKS cluster, I am using the following commands to test connectivity:
kubectl run kafka-consumer \
-ti \
--image=quay.io/strimzi/kafka:latest-kafka-2.8.1 \
--rm=true \
--restart=Never \
-- bin/kafka-topics.sh --create --topic test --bootstrap-server b-1.test.z35y0w.c4.kafka.us-east-1.amazonaws.com:9092 --replication-factor 2 --partitions 1 --if-not-exists
With the following result
Error while executing topic command : Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
[2021-11-03 02:31:20,865] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
pod "kafka-consumer" deleted
pod default/kafka-consumer terminated (Error)
Sadly, the second bootstrap server displayed on the MSK Page gives the same result.
nc eventually times out
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nc b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
nslookup fails as well
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nslookup b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
If you don't see a command prompt, try pressing enter.
*** Can't find b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com: No answer
Could anyone please give me a hint?
Thanks
I need to connect MSK from my EKS pod. So I searched this doc, I want to share my solution, hope can help others:
This my config file:
root#kain:~/work# cat kafkaconfig
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
This is my command:
./kafka-topics.sh --list --bootstrap-server <My MSK bootstrap server>:9098 --command-config ./kafkaconfig
For this command, there are 2 preconditions we need to make sure,
one is you have access to aws msk, (I access MSK from my eks pod, and my eks pod has OIDC to access the AWS).
Second is we need to has AWS auth jar file: aws-msk-iam-auth.jar
address: https://github.com/aws/aws-msk-iam-auth/releases
put it to kafkaclient libs directory or export CLASSPATH=/aws-msk-iam-auth-1.1.4-all.jar
reference doc: https://aws.amazon.com/blogs/big-data/securing-apache-kafka-is-easy-and-familiar-with-iam-access-control-for-amazon-msk/

Unable to create AKS cluster in westeurope location

Trying to setup an AKS cluster using this guide in the westeurope location but it keeps failing at this step.
When executing this command az aks create --location westeurope --resource-group <myResourceGroup> --name <myAKSCluster> --node-count 1 --generate-ssh-keys
I continuously get the following error message:
Operation failed with status: 'Bad Request'. Details: The VM size of Agent is not allowed in your subscription in location 'westeurope'. Agent VM size 'Standard_DS1_v2' is available in locations: australiaeast,australiasoutheast,brazilsouth,canadacentral,canadaeast,centralindia,centralus,centraluseuap,eastasia,eastus,eastus2euap,japaneast,japanwest,koreacentral,koreasouth,northcentralus,northeurope,southcentralus,southindia,uksouth,ukwest,westcentralus,westindia,westus,westus2.
Even when I explicitly set the VM size to a different type of VM I still get a similar error. For example:
az aks create --location westeurope --resource-group <myResourceGroup> --name <myAKSCluster> --node-vm-size Standard_B1s --node-count 1 --generate-ssh-keys
results in:
Operation failed with status: 'Bad Request'. Details: The VM size of Agent is not allowed in your subscription in location 'westeurope'. Agent VM size 'Standard_B1s' is available in locations: australiaeast,australiasoutheast,brazilsouth,canadacentral,canadaeast,centralindia,centralus,centraluseuap,eastasia,eastus,eastus2euap,japaneast,japanwest,koreacentral,koreasouth,northcentralus,northeurope,southcentralus,southindia,uksouth,ukwest,westcentralus,westindia,westus,westus2.
It looks likes creating an AKS cluster in westeurope is forbidden / not possible at all. Anybody created a cluster in this location succesfully?
This is a common problem atm for westeurope, looks like a Bug in Azure AKS. The VM's can be created through "Virtual machines" but not AKS.
Here is a different thread on this topic: https://github.com/Azure/AKS/issues/280
you just need to add --node-vm-size Standard_D2s_v3 in you command. It resolved my issue.
Note: It is to be noted that you need to pass Standard_D2s_v3 according to your region like my region WestUS supports Standard_d16ads_v5. The above command in the question will return the available vm sizes in the exception.

Ambari cluster : Host registration failed

I am setting up an ambari cluster with 3 virtualbox VMs running Ubuntu 16.04LTS.
I followed this hortonworks tutorial.
However when I am going to create a cluster using Ambari Cluster Install Wizard I get the below error during the step 3 - "Confirm Hosts".
26 Jun 2017 16:41:11,553 WARN [Thread-34] BSRunner:292 - Bootstrap process timed out. It will be destroyed.
26 Jun 2017 16:41:11,554 INFO [Thread-34] BSRunner:309 - Script log Mesg
INFO:root:BootStrapping hosts ['thanuja.ambari-agent1.com', 'thanuja.ambari-agent2.com'] using /usr/lib/python2.6/site-packages/ambari_server cluster primary OS: ubuntu16 with user 'thanuja'with ssh Port '22' sshKey File /var/run/ambari-server/bootstrap/5/sshKey password File null using tmp dir /var/run/ambari-server/bootstrap/5 ambari: thanuja.ambari-server.com; server_port: 8080; ambari version: 2.5.0.3; user_run_as: root
INFO:root:Executing parallel bootstrap
Bootstrap process timed out. It was destroyed.
I have read number of posts saying that this is related to not enabling Password-less SSH to the hosts. But I can ssh to the hosts without password from the server.
I am running ambari as non-root user with root privileges.
This post helped me.
I modified the users in host machines so that they can execute sudo commands without password using visudo command.
Please post if you have any alternative answers.

SSH into Kubernetes cluster running on Amazon

Created a 2 node Kubernetes cluster as:
KUBERNETES_PROVIDER=aws NUM_NODES=2 kube-up.sh
This shows the output as:
Found 2 node(s).
NAME STATUS AGE
ip-172-20-0-226.us-west-2.compute.internal Ready 57s
ip-172-20-0-227.us-west-2.compute.internal Ready 55s
Validate output:
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health": "true"}
etcd-1 Healthy {"health": "true"}
Cluster validation succeeded
Done, listing cluster services:
Kubernetes master is running at https://52.33.9.1
Elasticsearch is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/elasticsearch-logging
Heapster is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/heapster
Kibana is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/kibana-logging
KubeDNS is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/kube-dns
kubernetes-dashboard is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Grafana is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana
InfluxDB is running at https://52.33.9.1/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb
I can see the instances in EC2 console. How do I ssh into the master node?
Here is the exact command that worked for me:
ssh -i ~/.ssh/kube_aws_rsa admin#<masterip>
kube_aws_rsa is the default key generated, otherwise controlled with AWS_SSH_KEY environment variable. For AWS, it is specified in the file cluster/aws/config-default.sh.
More details about the cluster can be found using kubectl.sh config view.
"Creates an AWS SSH key named kubernetes-. Fingerprint here is the OpenSSH key fingerprint, so that multiple users can run the script with different keys and their keys will not collide (with near-certainty). It will use an existing key if one is found at AWS_SSH_KEY, otherwise it will create one there. (With the default Ubuntu images, if you have to SSH in: the user is ubuntu and that user can sudo"
https://github.com/kubernetes/kubernetes/blob/master/docs/design/aws_under_the_hood.md
You should see the ssh key-fingerprint locally in ssh config or set the ENV and recreate.
If you are throwing up your cluster on AWS with kops, and use CoreOS as your image, then the login name would be "core".