EKS Anywhere Cluster cert-manager io-timeout - amazon-eks

First time trying EKS Anywhere docker provider deployment as given in below link
https://anywhere.eks.amazonaws.com/docs/getting-started/local-environment/
It gets stuck at 'waiting for cert-manager' . Working on CentOS 7 .System is behind proxy.
Installing cert-manager Version="v1.5.3+66e1acc"
Using Override="cert-manager.yaml" Provider="cert-manager" Version="v1.5.3+66e1acc"
Waiting for cert-manager to be available...
Error: timed out waiting for the condition
Only cert-manager pods are not able to pull images
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-7988d4fb6c-bjhsv 0/1 ImagePullBackOff 0 5m54s
cert-manager cert-manager-cainjector-6bc8dcdb64-hvdx5 0/1 ImagePullBackOff 0 5m55s
cert-manager cert-manager-webhook-68979bfb95-q8ttt 0/1 ImagePullBackOff 0 5m54s
kube-system coredns-745c7986c7-2wrx5 1/1 Running 0 5m57s
kube-system coredns-745c7986c7-kx594 1/1 Running 0 5m57s
kube-system etcd-dev-cluster-eks-a-cluster-control-plane 1/1 Running 0 5m52s
kube-system kindnet-4jcvt 1/1 Running 0 5m57s
kube-system kube-apiserver-dev-cluster-eks-a-cluster-control-plane 1/1 Running 0 5m52s
kube-system kube-controller-manager-dev-cluster-eks-a-cluster-control-plane 1/1 Running 0 5m52s
kube-system kube-proxy-4dk2r 1/1 Running 0 5m57s
kube-system kube-scheduler-dev-cluster-eks-a-cluster-control-plane 1/1 Running 0 5m52s
local-path-storage local-path-provisioner-666bfc797f-nkhqf 1/1 Running 0 5m57s
same images are getting pulled using docker pull
public.ecr.aws/eks-anywhere/jetstack/cert-manager-webhook v1.5.3-eks-a-6 194bcfda671e 3 months ago 46MB
public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller v1.5.3-eks-a-6 1e6749016508 3 months ago 61.3MB
public.ecr.aws/eks-anywhere/jetstack/cert-manager-cainjector v1.5.3-eks-a-6 45723d794a88 3 months ago 42.4MB
kubectl describe gives below (i/o timeout) error as well as 'server misbehaving' error
Failed to pull image "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": rpc error: code = Unknown desc = failed to pull and unpack image "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": failed to resolve reference "public.ecr.aws/eks-anywhere/jetstack/cert-manager-controller:v1.5.3-eks-a-6": failed to do request: Head "https://public.ecr.aws/v2/eks-anywhere/jetstack/cert-manager-controller/manifests/v1.5.3-eks-a-6": dial tcp: lookup public.ecr.aws on 172.19.0.1:53: read udp 172.19.0.2:38941->172.19.0.1:53: i/o timeout

It was a proxy related issue. Resolved by adding proxy config in containerd service of docker container of node and restarting containerd service.
docker exec -it <container name> bash
Inside container
cd /etc/systemd/system/
mkdir containerd.service.d
touch http-proxy.conf
cat <<EOF >/etc/systemd/system/containerd.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://proxy ip:proxy port"
Environment="HTTPS_PROXY=http://proxy ip:proxy port"
Environment="NO_PROXY=${NO_PROXY:-localhost},${LOCAL_NETWORK}"
EOF
systemctl daemon-reload
systemctl restart containerd

Related

Not able to access the rabbimq Cluster which is setup using rabbitmq clustor operator

I have an AWS instance where I have minkibe installed. I have also added RabbitMQ cluster operator to it. After that I started rabbit cluster with 3 nodes. I am able to see 3 pods and their logs got no error. The service for Rabbitmq is started as loadbalancer. When i try to list URL for the service I get Rabbitmq, Rabbitmq management UI and Prometheus pod on ports. The external IP is not generated for the service. I use patch command to assign external IP.
MY issue is the RabbitMQ cluster is running fine with no errors but I am not able to access it from using public IP of the AWS instance so other services can send message to it.
Here are all the files --
clientq.yml file
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: clientq
spec:
replicas: 3
image: rabbitmq:3.9-management
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
rabbitmq:
additionalConfig: |
log.console.level = info
channel_max = 1700
default_user= guest
default_pass = guest
default_user_tags.administrator = true
service:
type: LoadBalancer
all setup --
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/clientq-server-0 1/1 Running 0 11m
pod/clientq-server-1 1/1 Running 0 11m
pod/clientq-server-2 1/1 Running 0 11m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
service/clientq LoadBalancer 10.108.225.186 12.27.54.12 5672:31063/TCP,15672:31340/TCP,15692:30972/TCP
service/clientq-nodes ClusterIP None <none> 4369/TCP,25672/TCP
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP
NAME READY AGE
statefulset.apps/clientq-server 3/3 11m
NAME ALLREPLICASREADY RECONCILESUCCESS AGE
rabbitmqcluster.rabbitmq.com/clientq True True 11m
here 12.27.54.12 is my public ip of the instance
which i patched using
kubectl patch svc clientq -n default -p '{"spec": {"type": "LoadBalancer", "externalIPs":["12.27.54.12"]}}'
the urls for service are --
minikube service clientq --url
http://192.168.49.2:31063
http://192.168.49.2:31340
http://192.168.49.2:30972
I am able to curl these from instance it self. But I am not able to access them from public ip of instance. Did i missed something or there is a way to expose these ports ? please let me know
I have enabled all ports for inbound and outbound traffic

AWS EKS CoreDNS issue on public subnet

I completed the AWS EKS using their setup steps.
AWS EKS ver 1.11, coredns
With the VPC I create two public and two private subnets according to their docs here: https://docs.aws.amazon.com/eks/latest/userguide/create-public-private-vpc.html
Nodes deployed to a private subnet are labeled private and nodes deployed to a public subnet are labeled public.
When I deploy a busybox pod to each nodeSelector (public/private) the public container cannot resolve dns while the private can.
nslookup: can't resolve 'kubernetes.default'
If I ssh onto the public subnet node itself I am able to ping hostnames (ie google.com) successfully.
Any thoughts?
# kubectl exec -it busybox-private -- nslookup kubernetes.default
Server: 172.20.0.10
Address 1: 172.20.0.10 ip-172-20-0-10.ec2.internal
Name: kubernetes.default
Address 1: 172.20.0.1 ip-172-20-0-1.ec2.internal
# kubectl exec -it busybox-public -- nslookup kubernetes.default
Server: 172.20.0.10
Address 1: 172.20.0.10
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
# kubectl -n=kube-system get all
NAME READY STATUS RESTARTS AGE
pod/aws-node-46626 1/1 Running 0 3h
pod/aws-node-52rqw 1/1 Running 1 3h
pod/aws-node-j7n8l 1/1 Running 0 3h
pod/aws-node-k7kbr 1/1 Running 0 3h
pod/aws-node-tr8x7 1/1 Running 0 3h
pod/coredns-7bcbfc4774-5ssnx 1/1 Running 0 20h
pod/coredns-7bcbfc4774-vxrgs 1/1 Running 0 20h
pod/kube-proxy-2c7gj 1/1 Running 0 3h
pod/kube-proxy-5qr9h 1/1 Running 0 3h
pod/kube-proxy-6r96f 1/1 Running 0 3h
pod/kube-proxy-9tqxt 1/1 Running 0 3h
pod/kube-proxy-bhkzx 1/1 Running 0 3h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP 20h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/aws-node 5 5 5 5 5 <none> 20h
daemonset.apps/kube-proxy 5 5 5 5 5 <none> 20h
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 20h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-7bcbfc4774 2 2 2 20h
Going through "Debugging DNS Resolution"
https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
Odd that AWS has their coredns pods still labelled kube-dns
# kubectl get pods --namespace=kube-system -l k8s-app=kubedns
No resources found.
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-7bcbfc4774-5ssnx 1/1 Running 0 20h
coredns-7bcbfc4774-vxrgs 1/1 Running 0 20h
# for p in $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name); do kubectl logs --namespace=kube-system $p; done
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
2019/01/31 15:23:36 [INFO] CoreDNS-1.1.3
2019/01/31 15:23:36 [INFO] linux/amd64, go1.10.5, d47c9319
.:53
CoreDNS-1.1.3
linux/amd64, go1.10.5, d47c9319
Looking at the worker node security groups is where I think I found the issue.
The AWS EKS kube-dns endpoints and pods were on the private subnet.
I have two CloudFormation stacks: one for autoscaling nodes in the private subnets and one for autoscaling nodes in the public subnets.
They didn't have a common security group so the pods running in the public nodes weren't able to access the kube-dns pods running on the private nodes.
Once I update the worker node security groups to allow cross communication the DNS started working.
Please post if anyone sees any unintended consequences.

Making logs available to Stackdriver from a Custom Kubernetes docker container running Apache and PHP-FPM

We are running a small test cluster of Custom Kubernetes pods on Google cloud, that internally are running Apache and PHP-FPM.
The Cluster has the following key config:
Master version: 1.10.6-gke.2
Kubernetes alpha features: Disabled
Total size: 3
StackDriver Logging: Enabled
StackDriver Monitoring:
Enabled
Once the cluster comes up a kubectl get pods --all-namespaces is showing the fluentd and heapster services running along side our services as I would expect.
kube-system event-exporter-v0.2.1-5f5b89fcc8-r89d5 2/2 Running 0 13d
kube-system fluentd-gcp-scaler-7c5db745fc-gbrqx 1/1 Running 0 21d
kube-system fluentd-gcp-v3.1.0-76mr4 2/2 Running 0 13d
kube-system fluentd-gcp-v3.1.0-kl4xp 2/2 Running 0 13d
kube-system fluentd-gcp-v3.1.0-vxsq5 2/2 Running 0 13d
kube-system heapster-v1.5.3-95c7549b8-fdlmm 3/3 Running 0 13d
kube-system kube-dns-788979dc8f-c9v2d 4/4 Running 0 99d
kube-system kube-dns-788979dc8f-rqp7d 4/4 Running 0 99d
kube-system kube-dns-autoscaler-79b4b844b9-zjtwk 1/1 Running 0 99d
We can get the logging from our application code (that runs inside our pods) to show up in Stackdriver Logging, but we want to aggregate the logging for Apache (/var/log/httpd/access_log and error_log) and PHP-FPM in Stackdriver as well.
This page from Google's Docs implies that this should be enabled by default.
https://cloud.google.com/kubernetes-engine/docs/how-to/logging
Note: Stackdriver Logging is enabled by default when you create a new cluster using the gcloud command-line tool or Google Cloud Platform Console.
However that is obviously not the case for us. We have tried a few different approaches to get this to work (listed below), but without success.
Including:
redirecting the log output from Apache to stdout and/or stderr, as described in this post.
https://serverfault.com/questions/711168/writing-apache2-logs-to-stdout-stderr
Installing the stackdriver agent inside each pod as described in https://cloud.google.com/monitoring/agent/plugins/apache#configuring
It didn't appear that this step should be required as the documentation implies that you only need to do this on a VM instance, but we tried it anyway on our k8s pods. As part of this step we made sure that Apache has mod_status enabled (/server-status) and PHP-FPM has /fpm-status enabled, and then installed the module Apache plugin following the docs.
Piping the Apache logging to STDOUT
How to Redirect Apache Logs to both STDOUT and Apache Log File
This seems like it should be a simple thing to do, but we have obviously missed something. Any help would be most appreciated.
Cheers, Julian Cone

How to configure kubernetes selenium container to reach external net?

I tried examples selenium via windows minikube.
https://github.com/kubernetes/kubernetes/tree/master/examples/selenium
at Inside the container, i cant install selenium, what should i do?
pip install selenium
cmd:
kubectl run selenium-hub --image selenium/hub:2.53.1 --port 4444
kubectl expose deployment selenium-hub --type=NodePort
kubectl run selenium-node-chrome --image selenium/node-chrome:2.53.1 --env="HUB_PORT_4444_TCP_ADDR=selenium-hub" --env="HUB_PORT_4444_TCP_PORT=4444"
kubectl scale deployment selenium-node-chrome --replicas=4
kubectl run selenium-python --image=google/python-hello
kubectl exec --stdin=true --tty=true selenium-python-6479976d89-ww7jv bash
display:
PS C:\Program Files\Docker Toolbox\dockerfiles> kubectl get pods
NAME READY STATUS RESTARTS AGE
selenium-hub-5ffc6ff7db-gwq95 1/1 Running 0 15m
selenium-node-chrome-8659b47488-brwb4 1/1 Running 0 8m
selenium-node-chrome-8659b47488-dnrwr 1/1 Running 0 8m
selenium-node-chrome-8659b47488-hwvvk 1/1 Running 0 11m
selenium-node-chrome-8659b47488-t8g59 1/1 Running 0 8m
selenium-python-6479976d89-ww7jv 1/1 Running 0 6m
PS C:\Program Files\Docker Toolbox\dockerfiles> kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 17m
selenium-hub NodePort 10.0.0.230 <none> 4444:32469/TCP 16m
PS C:\Program Files\Docker Toolbox\dockerfiles> kubectl exec --stdin=true --tty=true selenium-python-6479976d89-ww7jv bash
root#selenium-python-6479976d89-ww7jv:/app# ping yahoo.com
ping: unknown host yahoo.com
It looks like your pod can not resolve DNS. You need to test if your cluster has working kube-dns in kube-system namespace. If it is there and operational, check if it correctly resolves names when called upon directly by pod IP and maybe verify that your containers have correct content in /etc/resolv.conf when started
You can avoid this problem by providing ConfigMap to configure kube-dns for custom dns.
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-dns
namespace: kube-system
data:
stubDomains: |
{"acme.local": ["1.2.3.4"]}
upstreamNameservers: |
["8.8.8.8"]
See more details at Kubernetes Reference Doc

Kubernetes Redis Cluster issue

I'm trying to create redis cluster using kubernetes on centos. I have my kubernetes master running on one host and kubernetes slaves on 2 different hosts.
etcdctl get /kube-centos/network/config
{ "Network": "172.30.0.0/16", "SubnetLen": 24, "Backend": { "Type": "vxlan" } }
Here is my replication controller
apiVersion: v1
kind: ReplicationController
metadata:
name: redis-master
labels:
app: redis
role: master
tier: backend
spec:
replicas: 6
template:
metadata:
labels:
app: redis
role: master
tier: backend
spec:
containers:
- name: master
image: redis
command:
- "redis-server"
args:
- "/redis-master/redis.conf"
ports:
- containerPort: 6379
volumeMounts:
- mountPath: /redis-master
name: config
- mountPath: /redis-master-data
name: data
volumes:
- name: data
emptyDir: {}
- name: config
configMap:
name: redis-config
items:
- key: redis-config
path: redis.conf
kubectl create -f rc.yaml
NAME READY STATUS RESTARTS AGE IP NODE
redis-master-149tt 1/1 Running 0 8s 172.30.96.4 centos-minion-1
redis-master-14j0k 1/1 Running 0 8s 172.30.79.3 centos-minion-2
redis-master-3wgdt 1/1 Running 0 8s 172.30.96.3 centos-minion-1
redis-master-84jtv 1/1 Running 0 8s 172.30.96.2 centos-minion-1
redis-master-fw3rs 1/1 Running 0 8s 172.30.79.4 centos-minion-2
redis-master-llg9n 1/1 Running 0 8s 172.30.79.2 centos-minion-2
Redis-config file used
appendonly yes
cluster-enabled yes
cluster-config-file /redis-master/nodes.conf
cluster-node-timeout 5000
dir /redis-master
port 6379
I used the following command to create the kubernetes service.
kubectl expose rc redis-master --name=redis-service --port=6379 --target-port=6379 --type=NodePort
Name: redis-service
Namespace: default
Labels: app=redis
role=master
tier=backend
Selector: app=redis,role=master,tier=backend
Type: NodePort
IP: 10.254.229.114
Port: <unset> 6379/TCP
NodePort: <unset> 30894/TCP
Endpoints: 172.30.79.2:6379,172.30.79.3:6379,172.30.79.4:6379 + 3 more...
Session Affinity: None
No events.
Now I have all the pods and service up and running. I'm using redis-trib pod to create redis cluster.
kubectl exec -it redis-trib bash
./redis-trib.rb create --replicas 1 172.30.79.2:6379 172.30.79.3:6379 172.30.79.4:6379 172.30.96.2:6379 172.30.96.3:6379 172.30.96.4:6379
Redis Cluster created as expected with the below message.
[OK] All 16384 slots covered.
Now I should be able to access my redis-cluster on kubernetes node IP(192.168.240.116) and nodePort(30894) from any host within my network. Everything works as expected when I execute the below command from one of the kubernetes node.
redis-cli -p 30894 -h 192.168.240.116 -c
192.168.240.116:30894> set foo bar
-> Redirected to slot [12182] located at 172.30.79.4:6379
OK
172.30.79.4:6379>
When I run the same command from different (non-kubernetes) node within the same network, I see the connected timed out error.
redis-cli -c -p 30894 -h 192.168.240.116
192.168.240.116:30894> set foo bar
-> Redirected to slot [12182] located at 172.30.79.4:6379
Could not connect to Redis at 172.30.79.4:6379: Connection timed out
Is it not possible to access the redis-cluster outside the kubernetes cluster network when exposed using NodePort service type?
Also I cannot use LoadBalancer service type as I'm not hosting it on cloud.
I have been stuck with this issue for quite a while. Can someone suggest on what approach I should use to access my redis-cluster outside my network ?
Thanks
Running ./redis-trib.rb create --replicas 1 172.30.79.2:6379 172.30.79.3:6379 172.30.79.4:6379 172.30.96.2:6379 172.30.96.3:6379 172.30.96.4:6379 doesn't make sense with this setup.
The port 6379 is only accessible through the service which you brough up, but never directly as you try. That's why you run into issues when you try to use your setup.
What you can do is to expose each POD with it's own service and have one additional cluster services to loadbalance external requests. As shown in the example repository from Kelsey Hightower. This way the PODs can communicate though the internally exposed ports and (external) clients can use the loadbalanced cluster port. The implication then is also that each POD requires it's own ReplicaSet (or Deployment). There's a long talk available on YouTube from Kelsey explaining the setup - YouTube / Slideshare.
An alternative would be to use a single redis master as shown in other examples.