Redis not found using Copilot and Service Discovery - redis

I have a Load Balanced Web service deployed:
About
Application my-app
Name api
Type Load Balanced Web Service
Configurations
Environment Tasks CPU (vCPU) Memory (MiB) Port
----------- ----- ---------- ------------ ----
production 1 0.25 512 80
Routes
Environment URL
----------- ---
production http://xxxxx.us-east-1.elb.amazonaws.com
Service Discovery
Environment Namespace
----------- ---------
production api.my-app.local:80
Variables
Name Container Environment Value
---- --------- ----------- -----
COPILOT_APPLICATION_NAME " " my-app
COPILOT_ENVIRONMENT_NAME " " production
COPILOT_LB_DNS " " xxxx.us-east-1.elb.amazonaws.com
COPILOT_SERVICE_DISCOVERY_ENDPOINT " " my-app.local
COPILOT_SERVICE_NAME " " api
REDIS_URL " " redis://redis.my-app.local:6379
And redis as backend service in the same copilot application:
About
Application my-app
Name redis
Type Backend Service
Configurations
Environment Tasks CPU (vCPU) Memory (MiB) Port
----------- ----- ---------- ------------ ----
production 1 0.25 512 6379
Service Discovery
Environment Namespace
----------- ---------
production redis.my-app.local:6379
Variables
Name Container Environment Value
---- --------- ----------- -----
COPILOT_APPLICATION_NAME redis production my-app
COPILOT_ENVIRONMENT_NAME " " production
COPILOT_SERVICE_DISCOVERY_ENDPOINT " " my-app.local
COPILOT_SERVICE_NAME " " redis
When I look at the records on Route53, redis.my-app.local is present. But then the logs in my api always say:
uncaughtException: Redis connection to redis.my-app.local:6379 failed - getaddrinfo ENOTFOUND redis.my-app.local
Then at some point redis shuts down because there is no incoming connection... what is the issue?

It looks like you found the solution: DNS hostnames must be enabled.
For others who may encounter the same issue:
By default, Copilot does:
EnableDnsHostnames: true
EnableDnsSupport: true
We will add a warning for imported VPCs, but in the meantime, ensure that you have DNS hostnames enabled for your existing VPC.
Ref: https://github.com/aws/copilot-cli/issues/2211

Related

Health Check on Fabric CA

I have a hyperledger fabric network v2.2.0 deployed with 2 peer orgs and an orderer org in a kubernetes cluster. Each org has its own CA server. The CA pod keeps on restarting sometimes. In order to know whether the service of the CA server is reachable or not, I am trying to use the healthz API on port 9443.
I have used the livenessProbe condition in the CA deployment like so:
livenessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 9443
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
After configuring this liveness probe, the pod keeps on restarting with the event Liveness probe failed: HTTP probe failed with status code: 400. Why might this be happening?
HTTP 400 code:
The HTTP 400 Bad Request response status code indicates that the server cannot or will not process the request due to something that is perceived to be a client error (for example, malformed request syntax, invalid request message framing, or deceptive request routing).
This indicates that Kubernetes is sending the data in a way hyperledger is rejecting, but without more information it is hard to say where the problem is. Some quick checks to start with:
Send some GET requests directly to the hyperledger /healthz resource yourself. What do you get? You should get back either a 200 "OK" if everything is functioning, or a 503 "Service Unavailable" with details of which nodes are down (docs).
kubectl describe pod liveness-request. You should see a few lines towards the bottom describing the state of the liveness probe in more detail:
Restart Count: 0
.
.
.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned example-dc/liveness-request to dcpoz-d-sou-k8swor3
Normal Pulling 4m45s kubelet, dcpoz-d-sou-k8swor3 Pulling image "nginx"
Normal Pulled 4m42s kubelet, dcpoz-d-sou-k8swor3 Successfully pulled image "nginx"
Normal Created 4m42s kubelet, dcpoz-d-sou-k8swor3 Created container liveness
Normal Started 4m42s kubelet, dcpoz-d-sou-k8swor3 Started container liveness
Some other things to investigate:
httpGet options that might be helpful:
scheme – Protocol type HTTP or HTTPS
httpHeaders– Custom headers to set in the request
Have you configured the operations service?
You may need a valid client certificate (if TLS is enabled, and clientAuthRequired is set to true).

proxy-public not coming up in AWS EKS

I used the following terraform link to create an EKS cluster. Then followed the steps outlined here to install jupyterhub.
However, the proxy-public service doesnt come up
kubectl describe svc proxy-public -n jhub
Name: proxy-public
Namespace: jhub
Labels: app=jupyterhub
app.kubernetes.io/managed-by=Helm
chart=jupyterhub-0.10.6
component=proxy-public
heritage=Helm
release=jhub
Annotations: meta.helm.sh/release-name: jhub
meta.helm.sh/release-namespace: jhub
Selector: component=proxy,release=jhub
Type: LoadBalancer
Port: http 80/TCP
TargetPort: http/TCP
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 2m40s (x10 over 22m) service-controller Ensuring load balancer
Warning SyncLoadBalancerFailed 2m40s (x10 over 22m) service-controller Error syncing load balancer: failed to ensure load balancer: could not find any suitable subnets for creating the ELB
I have already verified that the subnets are tagged correctly as explained here
aws ec2 describe-subnets --query "Subnets[].[SubnetID,Tags[]]" --output text
None
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
kubernetes.io/role/elb 1
Name dev-vpc-public-us-west-2c
None None
None
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
kubernetes.io/role/elb 1
Name dev-vpc-public-us-west-2a
None
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
kubernetes.io/role/internal-elb 1
Name dev-vpc-private-us-west-2c
None
Name dev-vpc-private-us-west-2b
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
kubernetes.io/role/internal-elb 1
None None
None None
None None
None
Name dev-vpc-public-us-west-2b
kubernetes.io/role/elb 1
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
None
Name dev-vpc-private-us-west-2a
kubernetes.io/cluster/education-eks-ZPQBVzm1 shared
kubernetes.io/role/internal-elb 1
Any idea what might be causing this?
I figured out the issue, apparently the terraform template here is wrong. It attaches a different cluster_name tag to the subnet (in vpc.tf)
locals {
cluster_name = "education-eks-${random_string.suffix.result}"
}
I wanted a different name, so i had modified the cluster_name in the eks.tf. I should have modified this local instead

Kubernetes dashboard authentication on atomic host

I am a total newbie in terms of kubernetes/atomic host, so my question may be really trivial or well discussed already - but unfortunately i couldn't find any clues how to achieve my goal - that's why i am here.
I have set up kubernetes cluster on atomic hosts (right now i have just one master and one node). I am working in the cloud network, on the virtual machines.
[root#master ~]# kubectl get node
NAME STATUS AGE
192.168.2.3 Ready 9d
After a lot of fuss i managed to set up the kubernetes dashboard UI on my master.
[root#master ~]# kubectl describe pod --namespace=kube-system
Name: kubernetes-dashboard-3791223240-8jvs8
Namespace: kube-system
Node: 192.168.2.3/192.168.2.3
Start Time: Thu, 07 Sep 2017 10:37:31 +0200
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=3791223240
Status: Running
IP: 172.16.43.2
Controllers: ReplicaSet/kubernetes-dashboard-3791223240
Containers:
kubernetes-dashboard:
Container ID: docker://8fddde282e41d25c59f51a5a4687c73e79e37828c4f7e960c1bf4a612966420b
Image: gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3
Image ID: docker-pullable://gcr.io/google_containers/kubernetes-dashboard-amd64#sha256:2c4421ed80358a0ee97b44357b6cd6dc09be6ccc27dfe9d50c9bfc39a760e5fe
Port: 9090/TCP
Args:
--apiserver-host=http://192.168.2.2:8080
Limits:
cpu: 100m
memory: 300Mi
Requests:
cpu: 100m
memory: 100Mi
State: Running
Started: Fri, 08 Sep 2017 10:54:46 +0200
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Thu, 07 Sep 2017 10:37:32 +0200
Finished: Fri, 08 Sep 2017 10:54:44 +0200
Ready: True
Restart Count: 1
Liveness: http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
Volume Mounts: <none>
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
No volumes.
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1d 32m 3 {kubelet 192.168.2.3} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
1d 32m 2 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Pulled Container image "gcr.io/google_containers/kubernetes-dashboard-amd64:v1.6.3" already present on machine
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Created Created container with docker id 8fddde282e41; Security:[seccomp=unconfined]
32m 32m 1 {kubelet 192.168.2.3} spec.containers{kubernetes-dashboard} Normal Started Started container with docker id 8fddde282e41
also
[root#master ~]# kubectl cluster-info
Kubernetes master is running at http://localhost:8080
kubernetes-dashboard is running at http://localhost:8080/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
Now, when i tried connecting to the dashboard (i tried accessing the dashbord via the browser on windows virtual machine in the same cloud network) using the adress:
https://192.168.218.2:6443/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard
I am getting the "unauthorized". I believe it proves that the dashboard is indeed running under this address, but i need to set up some way of accessing it?
What i want to achieve in the long term:
i want to enable connecting to the dashboard using the login/password (later, when i learn a bit more, i will think about authenticating by certs or somehting more safe than password) from the outside of the cloud network. For now, connecting to the dashboard at all would do.
I know there are threads about authenticating, but most of them are mentioning something like:
Basic authentication is enabled by passing the
--basic-auth-file=SOMEFILE option to API server
And this is the part i cannot cope with - i have no idea how to pass options to API server.
On the atomic host the api-server,kube-controller-manager and kube-scheduler are running in containers, so I get into the api-server container with command:
docker exec -it kube-apiserver.service bash
I saw few times that i should edit .json file in /etc/kubernetes/manifest directory, but unfortunately there is no such file (or even a directory).
I apologize if my problem is too trivial or not described well enough, but im new to (both) IT world and the stackoverflow.
I would love to provide more info, but I am afraid I would end up including lots of useless information, so i decided to wait for your instructions in that regard.
Check out wiki pages of kubernetes dashboard they describe how to get access to dashboard and how to authenticate to it. For quick access you can run:
kubectl proxy
And then go to following address:
http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
You'll see two options, one of them is uploading your ~/.kube/config file and the other one is using a token. You can get a token by running following command:
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep service-account-token | head -n 1 | awk '{print $1}')
Now just copy and paste the long token string into dashboard prompt and you're done.

Celery workers stalled on boot

We boot up a cluster of 250 worker nodes in AWS at night to handle some long-running distributed tasks.
The worker nodes are running celery with the following command:
celery -A celery_worker worker --concurrency=1 -l info -n background_cluster.i-1b1a0dbb --without-heartbeat --without-gossip --without-mingle -- celeryd.prefetch_multiplier=1
We are using rabbitmq as our broker, and there is only 1 rabbitmq node.
About 60% of our nodes claim to be listening, but will not pick up any tasks.
Their logs look like this:
-------------- celery#background_cluster.i-1b1a0dbb v3.1.18 (Cipater)
---- **** -----
--- * *** * -- Linux-3.2.0-25-virtual-x86_64-with-Ubuntu-14.04-trusty
-- * - **** ---
- ** ---------- [config]
- ** ---------- .> app: celery_worker:0x7f10c2235cd0
- ** ---------- .> transport: amqp://guest:**#localhost:5672//
- ** ---------- .> results: disabled
- *** --- * --- .> concurrency: 1 (prefork)
-- ******* ----
--- ***** ----- [queues]
-------------- .> background_cluster exchange=root(direct) key=background_cluster
[tasks]
. more.celery_worker.background_cluster
[2015-10-10 00:20:17,110: WARNING/MainProcess] celery#background_cluster.i-1b1a0dbb
[2015-10-10 00:20:17,110: WARNING/MainProcess] consuming from
[2015-10-10 00:20:17,110: WARNING/MainProcess] {'background_cluster': <unbound Queue background_cluster -> <unbound Exchange root(direct)> -> background_cluster>}
[2015-10-10 00:20:17,123: INFO/MainProcess] Connected to amqp://our_server:**#10.0.11.136:5672/our_server
[2015-10-10 00:20:17,144: WARNING/MainProcess] celery#background_cluster.i-1b1a0dbb ready.
However, rabbitmq shows that there are messages waiting in the queue.
If I login to any of the worker nodes and issue this command:
celery -A celery_worker inspect active
...then every (previously stalled) worker node immediately grabs a task and starts cranking.
Any ideas as to why?
Might it be related to these switches?
--without-heartbeat --without-gossip --without-mingle
It turns out that this was a bug in celery where using --without-gossip kept events from draining. Celery's implementation of gossip is pretty new, and it apparently implicitly takes care of draining events, but when you turn it off things get a little wonky.
The details to the issue are outlined in this github issue: https://github.com/celery/celery/issues/1847
Master currently has the fix in this PR: https://github.com/celery/celery/pull/2823
So you can solve this one of three ways:
Use gossip (remove --without-gossip)
Patch your version of celery with https://github.com/celery/celery/pull/2823.patch
Use a cron job to run a celery inspect active regularly

Sendmail Host unknown error

I am trying to a send mail using sendmail. But the mail does not get delievered.
In the /var/mail/root I get this error
----- The following addresses had permanent fatal errors -----
<noreply#xxxxx.com>
(reason: 550 Host unknown)
----- Transcript of session follows -----
550 5.1.2 <noreply#xxxxx.com>... Host unknown (Name server: xxxxx.com: host not found)
Did you configure Sendmail? Here is a ubuntu guide http://developernote.com/2012/07/how-i-configured-sendmail-for-php-on-ubuntu-server-12-04/ or google "install sendmail"
Can you get MX or A DNS records of xxxxx.com?
NO => it is not Sendmail's fault.
A "non sendmail" test:
dig MX xxxxx.com