Azure ACS - Kubernetes inter-pod communication - azure-container-service

I've made an ACS instance.
az acs create --orchestrator-type=kubernetes \
--resource-group $group \
--name $k8s_name \
--dns-prefix $kubernetes_server \
--generate-ssh-keys
az acs kubernetes get-credentials --resource-group $group --name $k8s_name
And run helm init it has provisioned tiller pod fine. I then ran helm install stable/redis and got a redis deployment up and running (seemingly).
I can kube exec -it into the redis pod, and can see it's binding on 0.0.0.0 and can log in with redis-cli -h localhost and redis-cli -h <pod_ip>, but not redis-cli -h <service_ip> (from kubectl get svc.)
If I run up another pod (which is how I ran into this issue) I can ping redis.default and it shows the DNS resolving to the correct service IP but gives no response. When I telnet <service_ip> 6379 or redis-cli -h <service_ip> it hangs indefinitely.
I'm at a bit of a loss as to how to debug further. I can't ssh into the node to see what docker is doing.
Also, I'd initially tried this with a standard Alphine-Redis image, so the helm was a fallback. I tried it yesterday and the helm one worked, but the manual one didn't. Today doing it (on a newly built ACS cluster) it's not working at all on either.
I'm going to spin up the cluster again to see if its a stable reproduce, but I'm pretty confident something fishy is going on.
PS - I have a VNet with overlapping subnet 10.0.0.0/16 in a different region, when I go into the address range I do get a warning there that there is a clash, could that affect it?
<EDIT>
Some new insight... It's something to do with alpine based images (which we've been aiming to use)...
So kube run a --image=nginx (which is ubuntu based) and I can shell in, install telnet and connect to the redis service.
But, e.g. kubectl run c --image=rlesouef/alpine-redis then shell in, and telnet doesn't work to the same redis service.
</EDIT>

There was a similar issue https://github.com/Azure/acs-engine/issues/539 that has been fixed recently. One thing to verify is to check if nslookup works in the container.

Related

Restart pod depend on health check

I am using Azure Kubernetes service, I found sometimes I'm getting failing health checks to SQL Server, then my API is responding to any request with code 400.
In this case, a simple pod restart usually helps; I thought that liveness / readyness probes will manage that in such scenario, but it's not.
Any ideas how may i automatize restarts on pods if this happened again?
Monitor and restart unhealthy docker containers. This functionality was proposed to be included with the addition of HEALTHCHECK, however didn't make the cut. This container is a stand-in till there is native support for --exit-on-unhealthy https://github.com/docker/docker/pull/22719
Sample compose file is:
docker run -d \
--name autoheal \
--restart=always \
-e AUTOHEAL_CONTAINER_LABEL=all \
-v /var/run/docker.sock:/var/run/docker.sock \
willfarrell/autoheal
Simply execute docker-compose up -d on this
a) Apply the label autoheal=true to your container to have it watched.
b) Set ENV AUTOHEAL_CONTAINER_LABEL=all to watch all running containers.
c) Set ENV AUTOHEAL_CONTAINER_LABEL to existing label name that has the value true.
Refer official document https://hub.docker.com/r/willfarrell/autoheal/ for more details.

How connect to MSK cluster from EKS cluster

I am having difficulties connecting to my MSK cluster from my EKS cluster even though both clusters share the same VPC and the same subnets.
The security group used by the MSK cluster has the following inbound rules
type
protocol
port range
source
all traffic
all
all
custom
SG_ID
all traffic
all
all
anywhere ipv4
0.0.0.0/0
Where SG_ID is the EKS' Cluster security group.
The one labeled: EKS created security group applied...
In the EKS cluster, I am using the following commands to test connectivity:
kubectl run kafka-consumer \
-ti \
--image=quay.io/strimzi/kafka:latest-kafka-2.8.1 \
--rm=true \
--restart=Never \
-- bin/kafka-topics.sh --create --topic test --bootstrap-server b-1.test.z35y0w.c4.kafka.us-east-1.amazonaws.com:9092 --replication-factor 2 --partitions 1 --if-not-exists
With the following result
Error while executing topic command : Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
[2021-11-03 02:31:20,865] ERROR org.apache.kafka.common.errors.TimeoutException: Call(callName=createTopics, deadlineMs=1635906680860, tries=1, nextAllowedTryMs=1635906680961) timed out at 1635906680861 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: createTopics
(kafka.admin.TopicCommand$)
pod "kafka-consumer" deleted
pod default/kafka-consumer terminated (Error)
Sadly, the second bootstrap server displayed on the MSK Page gives the same result.
nc eventually times out
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nc b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
nslookup fails as well
kubectl run busybox -ti --image=busybox --rm=true --restart=Never -- nslookup b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com
If you don't see a command prompt, try pressing enter.
*** Can't find b-2.test.z35y0w.c4.kafka.us-east-1.amazonaws.com: No answer
Could anyone please give me a hint?
Thanks
I need to connect MSK from my EKS pod. So I searched this doc, I want to share my solution, hope can help others:
This my config file:
root#kain:~/work# cat kafkaconfig
security.protocol=SASL_SSL
sasl.mechanism=AWS_MSK_IAM
sasl.jaas.config=software.amazon.msk.auth.iam.IAMLoginModule required;
sasl.client.callback.handler.class=software.amazon.msk.auth.iam.IAMClientCallbackHandler
This is my command:
./kafka-topics.sh --list --bootstrap-server <My MSK bootstrap server>:9098 --command-config ./kafkaconfig
For this command, there are 2 preconditions we need to make sure,
one is you have access to aws msk, (I access MSK from my eks pod, and my eks pod has OIDC to access the AWS).
Second is we need to has AWS auth jar file: aws-msk-iam-auth.jar
address: https://github.com/aws/aws-msk-iam-auth/releases
put it to kafkaclient libs directory or export CLASSPATH=/aws-msk-iam-auth-1.1.4-all.jar
reference doc: https://aws.amazon.com/blogs/big-data/securing-apache-kafka-is-easy-and-familiar-with-iam-access-control-for-amazon-msk/

Docker Socket Without TLS

I have a TLS secured docker demon running. I use TLS for remote accessing the docker demon and access docker locally without any TLS. Normally...
Recently, I have updated Docker. Apparently I cannot connect to the local socket anymore. I suppose Docker is using now TLS for remote and local connections.
Is there a way to disable TLS for the local Docker socket?
Output of ps auxw | grep dockerd:
/usr/bin/dockerd -H 0.0.0.0:2376 --tlsverify --tlscacert /home/dockermanager/.docker/ca.pem --tlscert /home/dockermanager/.docker/server-cert.pem --tlskey /home/dockermanager/.docker/server-key.pem
Had been able to fix this myself.
I needed to migrate to these two systemd files provided by Docker:
https://github.com/moby/moby/tree/master/contrib/init/systemd
One service file is for the docker demon and there is one for the docker socket separately. The docker socket is a required dependency by docker.service and will be loaded, restartet and stopped accordingly.
Then i needed to add the docker demon parameter -H unix:// in order to activate the docker demon listening to the docker socket.
Afterwards everything worked as always and I assume local docker.socket communication does not need tls verification at all.
Start command now:
/usr/bin/dockerd -H unix:// -H tcp://0.0.0.0:2376 --tlsverify --tlscacert /home/dockeruser/.docker/ca.pem --tlscert /home/dockeruser/.docker/server-cert.pem --tlskey /home/dockeruser/.docker/server-key.pem

CentOS7: Are you trying to connect to a TLS-enabled daemon without TLS?

I've installed Docker on CentOS7, now I try to launch the server in a Docker container.
$ docker run -d --name "openshift-origin" --net=host --privileged \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /tmp/openshift:/tmp/openshift \
openshift/origin start
This is the output:
Post http:///var/run/docker.sock/v1.19/containers/create?name=openshift-origin: dial unix /var/run/docker.sock: permission denied. Are you trying to connect to a TLS-enabled daemon without TLS?
I have tried the same command with sudo and that works fine (I can also run images in OpenShift bash etc.) But it feels wrong to use it, am I right? What is a solution to let is work as normal user?
Docker is running (sudo service docker start). Restarting the CentOS did not help.
The error is:
/var/run/docker.sock: permission denied.
That seems pretty clear: the permissions on the Docker socket at /var/run/docker.sock do not permit you to access it. This is reasonably common, because handing someone acccess to the Docker API is effectively the same as giving them sudo privileges, but without any sort of auditing.
If you are the only person using your system, you can:
Create a docker group or similar if one does not already exist.
Make yourself a member of the docker group
Modify the startup configuration of the docker daemon to make the socket owned by that group by adding -G docker to the options. You'll probably want to edit /etc/sysconfig/docker to make this change, unless it's already configured that way.
With these changes in place, you should be able to access docker from your user account with requiring sudo.

Connect from one Docker container to another

I want to run rabbitmq-server in one docker container and connect to it from another container using celery (http://celeryproject.org/)
I have rabbitmq running using the below command...
sudo docker run -d -p :5672 markellul/rabbitmq /usr/sbin/rabbitmq-server
and running the celery via
sudo docker run -i -t markellul/celery /bin/bash
When I am trying to do the very basic tutorial to validate the connection on http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html
I am getting a connection refused error:
consumer: Cannot connect to amqp://guest#127.0.0.1:5672//: [Errno 111]
Connection refused.
When I install rabbitmq on the same container as celery it works fine.
What do I need to do to have container interacting with each other?
[edit 2016]
Direct links are deprecated now. The new way to do link containers is docker network connect. It works quite similar to virtual networks and has a wider feature set than the old way of linking.
First you create your named containers:
docker run --name rabbitmq -d -p :5672 markellul/rabbitmq /usr/sbin/rabbitmq-server
docker run --name celery -it markellul/celery /bin/bash
Then you create a network (last parameter is your network name):
docker network create -d bridge --subnet 172.25.0.0/16 mynetwork
Connect the containers to your newly created network:
docker network connect mynetwork rabbitmq
docker network connect mynetwork celery
Now, both containers are in the same network and can communicate with each other.
A very detailed user guide can be found at Work with networks: Connect containers.
[old answer]
There is a new feature in Docker 0.6.5 called linking, which is meant to help the communication between docker containers.
First, create your rabbitmq container as usual. Note that i also used the new "name" feature which makes life a litte bit easier:
docker run --name rabbitmq -d -p :5672 markellul/rabbitmq /usr/sbin/rabbitmq-server
You can use the link parameter to map a container (we use the name here, the id would be ok too):
docker run --link rabbitmq:amq -i -t markellul/celery /bin/bash
Now you have access to the IP and Port of the rabbitmq container because docker automatically added some environmental variables:
$AMQ_PORT_5672_TCP_ADDR
$AMQ_PORT_5672_TCP_PORT
In addition Docker adds a host entry for the source container to the /etc/hosts file. In this example amq will be a defined host in the container.
From Docker documentation:
Unlike host entries in the /etc/hosts file, IP addresses stored in the environment variables are not automatically updated if the source container is restarted. We recommend using the host entries in /etc/hosts to resolve the IP address of linked containers.
Just get your container ip, and connect to it from another container:
CONTAINER_IP=$(sudo docker inspect --format '{{ .NetworkSettings.IPAddress }}' $CONTAINER_ID)
echo $CONTAINER_IP
When you specify -p 5672, What docker does is open up a new port, such as 49xxx on the host and forwards it to port 5672 of the container.
you should be able to see which port is forwarding to the container by running:
sudo docker ps -a
From there, you can connect directly to the host IP address like so:
amqp://guest#HOST_IP:49xxx
You can't use localhost, because each container is basically its own localhost.
Create Image:
docker build -t "imagename1" .
docker build -t "imagename2" .
Run Docker image:
docker run -it -p 8000:8000 --name=imagename1 imagename1
docker run -it -p 8080:8080 --name=imagename2 imagename2
Create Network:
docker network create -d bridge "networkname"
Connect the network with container(imagename) created after running the image:
docker network connect "networkname" "imagename1"
docker network connect "networkname" "imagename2"
We can add any number of containers to the network.
docker network inspect ''networkname"
I think you can't connect to another container directly by design - that would be the responsibility of the host. An example of sharing data between containers using Volumes is given here http://docs.docker.io/en/latest/examples/couchdb_data_volumes/, but I don't think that that is what you're looking for.
I recently found out about https://github.com/toscanini/maestro - that might suit your needs. Let us know if it does :), I haven't tried it myself yet.
Edit. Note that you can read here that native "Container wiring and service discovery" is on the roadmap. I guess 7.0 or 8.0 at the latest.
You can get the docker instance IP with...
CID=$(sudo docker run -d -p :5672 markellul/rabbitmq /usr/sbin/rabbitmq-server); sudo docker inspect $CID | grep IPAddress
But that's not very useful.
You can use pipework to create a private network between docker containers.
This is currently on the 0.8 roadmap:
https://github.com/dotcloud/docker/issues/1143