ResourceInitializationError: failed to validate logger args: : signal: killed - aws-fargate

Suddenly getting the message " ResourceInitializationError: failed to validate logger args: : signal: killed" while starting AWS ECS Fargate Service. Same service was running fine couple of days back.
Following is log driver configurations in related aws task:
Log Configuration
Log driver: awslogs
Key Value
awslogs-group /ecs/analytics
awslogs-region us-east-1
awslogs-stream-prefix ecs
Any idea or help?

I finally found the root cause:
The error appears if the fargate service is not able to connect to the CloudWatch api endpoint.
This might happen if you have fargate running in a private subnet without internet access.
You could either add the CloudWatch log Endpoint to your private subnet or add internet connectivity

I recently spent hours on this same issue. It turns out that the log group and stream prefix specified in my container definition didn't exist.
It would be wonderful if AWS could provide helpful error messages...

Came across this issue today. The issue was that the log group I specified didn't exist yet. If you don't want to manually create it, make sure to add the awslogs-create-group and set it to "true". You'll have to grant your ECS Task Execution role a logs:CreateLogGroup permission as well.
"logConfiguration": {
"logDriver": "awslogs",
"secretOptions": null,
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/ecs/app",
"awslogs-region": "ap-southeast-2",
"awslogs-stream-prefix": "ecs"
}
}
Reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html

I just experienced this. I have ECS Fargate running and I've just added a VPC endpoint for Cloudwatch Logs com.amazonaws.REGION.logs in my account. When I added the VPC endpoint my logs stopped appearing.
In order to remedy this without deleting the VPC endpoint again, for my setup with Fargate running with internet access I had to ensure that:
My ECS service had a security group rule that to allows HTTPS traffic outbound
{
type: egress
port_to: 443
port_from: 443
protocol: TCP
}
That my new VPC Endpoint had a security group rule to allow HTTPS traffic inbound from my ECS security group
{
type: ingress
port_to: 443
port_from: 443
protocol: TCP
source_security_group_id: [Your ECS SECURITY GROUP ID]
}

I got this error, checked my NAT and IG, and all is good. And I found the endpoint interface also was set up as com.amazonaws.use-ease-1.logs
Nothing seems to need to change. Finally, I deleted the interface endpoint and the error went away.
But I am still confusing what happened.

Related

Connectivity to AWS EKS control plane via Client VPN

I have created EKS cluster with API server endpoint access as "Private". Cluster is configured in private subnet. I'd like to allow kubectl access from local PC. I have created Client VPN, it has access to private network (verified that by SSH to an EC2 instance running in the same private subnet). But kubectl gets "unable to connect to the server: dial x.x.x.x:443 i/o timout". "aws eks update-kubeconfig" can see that cluster and updates local context properly. What could be the problem?
Found out what was was missing. 443 had to be enabled in authorization rules

In Cloud Foundry, how do I create a service to run my Apache web server?

I'm on Ubuntu 18, running the following version of Cloud Foundry ...
$ cf -v
cf version 7.4.0+e55633fed.2021-11-15
I would to set up several containers, running off Docker image. First is an Apache web server. I have the following Dockerfile
FROM httpd:2.4
COPY ./my-httpd.conf /usr/local/apache2/conf/httpd.conf
COPY ./my-vhosts.conf /usr/local/apache2/conf/extra/httpd-vhosts.conf
COPY ./directory /usr/local/apache2/htdocs/directory
How do I set this up in Cloud foundry? I tried creating a service but got these errors
$ cf cups apache-service -p "localhost, 80"
FAILED
No API endpoint set. Use 'cf login' or 'cf api' to target an endpoint.
When I tried to create this API endpoint I got
$ cf api "http://my_ip_address"
Setting API endpoint to http://my_ip_address...
Request error: Get "http://my_ip_address": dial tcp my_ip_address:80: connect: connection refused
TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.
I'm thinking I'm missing something rather substantial but don't know what the right questions to ask are.
The error message you are providing (dial tcp my_ip_address:80: connect: connection refused ) is related to the cf api $address not responding.
Ensure that your Cloud Foundry API Endpoint is still active and you don't have any firewall preventing you from accessing the API. (port is open, the process is running, and the firewall is allowing traffic from your IP if applicable)

AWS CLI S3 list with a default endpoint

I'm using the following command on some ec2 instances in order to get some configuration files from an s3 bucket. The ec2 has an instance role attached with s3 full permissions:
aws s3 cp s3://bucket-name/file ./ --region eu-west-1
This work as expected on some instances provided by me with a default ami, but one some existing instances in the same region and AZ with the same instance role i'm facing the following error:
Connect timeout on endpoint URL: "https://bucket-name.eu-west-1.amazonaws.com/?list-type=2&delimiter=2%F&prefix=&encoding-type=url"
failed to run commands: exit status 255
My question is why the S3Uris is not prefixed with s3:// and returns the error with url string https:// ? it's clear that this aws cli version tries to reach s3 through https not by s3:// endpoint provided by me in the command. Is there anyway to overwrite this?
My question is why the S3Uris is not prefixed with s3:// and returns
the error with url string https:// ?
Behind the scenes aws cli call the AWS services using HTTPS so that why is why on timeout you see https://bucket-name.eu-west-1... timeout instead of s3:// .
By default, the AWS CLI sends requests to AWS services by using HTTPS on TCP port 443. To use the AWS CLI successfully, you must be able to make outbound connections on TCP port 443.
aws-cli-chap-using
The timeout on some instance might be because they are in private subnet without NAT gateway.
you can simply verify this by doing ping google.com if it not responding then the instance in the private subnet without NAT or has no outbound allowed traffic.

kubernetes authentication against the API server

I have setup a kubernetes cluster from scratch. This just means I did not use services provided by others, but used the k8s installer it self. Before we used to have other clusters, but with providers and they give you tls cert and key for auth, etc. Now this cluster was setup by myself, I have access via kubectl:
$ kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 21h
$
I also tried this and I can add a custom key, but then when I try to query via curl I get pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope.
I can not figure out where can I get the cert and key for a user to authenticate using the API for tls auth. I have tried to understand the official docs, but I have got nowhere. Can someone help me find where those files are or how to add or get certificates that i can use for the rest API?
Edit1: my .kube.config file looks like this:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0t(...)=
server: https://private_IP:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubernetes-admin
name: kubernetes-admin#kubernetes
current-context: kubernetes-admin#kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
user:
client-certificate-data: LS0tLS(...)Qo=
client-key-data: LS0(...)tCg==
It works from the localhost just normally.
In the other hand I noticed something. From the localhost I can access the cluster by generating the token using this method.
Also notice that for now I do not mind about creating multiple roles for multiple users, etc. I just need access to the API from remote and can be using "default" authentication or roles.
Now when I try to do the same from remote I get the following:
I tried using that config to run kubectl get all from remote, it runs for a while and then ends in Unable to connect to the server: dial tcpprivate_IP:6443: i/o timeout.
This happens because the config has private_IP, then I changed the IP to Public_IP:6443 and now get the following : Unable to connect to the server: x509: certificate is valid for some_private_IP, My_private_IP, not Public_IP:6443
Keep present that this is and AWS ec2 instance with elastic IP (You can think of Elastic IP as just a public IP on a traditional setup, but this public ip is on your public router and then this router routes requests to your actual server on private network). For AWS fans like I said, I can not use the EKS service here.
So how do I get this to be able to use the Public IP?
It seems your main problem is the TLS server certificate validation.
One option is to tell kubectl to skip the validation of the server certificate:
kubectl --insecure-skip-tls-verify ...
This has obviously the potential to be "insecure", but that depends on your use case
Another option is to recreate the cluster with the public IP address added to the server certificate. And it should also be possible to recreate only the certificate with kubeadm without recreating the cluster. Details about the latter two points can be found in this answer.
You need to setup RBAC for the user. define roles and rolebinding. follow the link for reference -> https://docs.bitnami.com/kubernetes/how-to/configure-rbac-in-your-kubernetes-cluster/

Can not link a HTTP Load Balancer to a backend (502 Bad Gateway)

I have on the backend a Kubernetes node running on port 32656 (Kubernetes Service of type NodePort). If I create a firewall rule for the <node_ip>:32656 to allow traffic, I can open the backend in the browser on this address: http://<node_ip>:32656.
What I try to achieve now is creating an HTTP Load Balancer and link it to the above backend. I use the following script to create the infrastructure required:
#!/bin/bash
GROUP_NAME="gke-service-cluster-61155cae-group"
HEALTH_CHECK_NAME="test-health-check"
BACKEND_SERVICE_NAME="test-backend-service"
URL_MAP_NAME="test-url-map"
TARGET_PROXY_NAME="test-target-proxy"
GLOBAL_FORWARDING_RULE_NAME="test-global-rule"
NODE_PORT="32656"
PORT_NAME="http"
# instance group named ports
gcloud compute instance-groups set-named-ports "$GROUP_NAME" --named-ports "$PORT_NAME:$NODE_PORT"
# health check
gcloud compute http-health-checks create --format none "$HEALTH_CHECK_NAME" --check-interval "5m" --healthy-threshold "1" --timeout "5m" --unhealthy-threshold "10"
# backend service
gcloud compute backend-services create "$BACKEND_SERVICE_NAME" --http-health-check "$HEALTH_CHECK_NAME" --port-name "$PORT_NAME" --timeout "30"
gcloud compute backend-services add-backend "$BACKEND_SERVICE_NAME" --instance-group "$GROUP_NAME" --balancing-mode "UTILIZATION" --capacity-scaler "1" --max-utilization "1"
# URL map
gcloud compute url-maps create "$URL_MAP_NAME" --default-service "$BACKEND_SERVICE_NAME"
# target proxy
gcloud compute target-http-proxies create "$TARGET_PROXY_NAME" --url-map "$URL_MAP_NAME"
# global forwarding rule
gcloud compute forwarding-rules create "$GLOBAL_FORWARDING_RULE_NAME" --global --ip-protocol "TCP" --ports "80" --target-http-proxy "$TARGET_PROXY_NAME"
But I get the following response from the Load Balancer accessed through the public IP in the Frontend configuration:
Error: Server Error
The server encountered a temporary error and could not complete your
request. Please try again in 30 seconds.
The health check is left with default values: (/ and 80) and the backend service responds quickly with a status 200.
I have also created the firewall rule to accept any source and all ports (tcp) and no target specified (i.e. all targets).
Considering that regardless of the port I choose (in the instance group), and that I get the same result (Server Error), the problem should be somewhere in the configuration of the HTTP Load Balancer. (something with the health checks maybe?)
What am I missing from completing the linking between the frontend and the backend?
I assume you actually have instances in the instance group, and the firewall rule is not specific to a source range. Can you check your logs for a google health check? (UA will have google in it).
What version of kubernetes are you running? Fyi there's a resource in 1.2 that hooks this up for you automatically: http://kubernetes.io/docs/user-guide/ingress/, just make sure you do these: https://github.com/kubernetes/contrib/blob/master/ingress/controllers/gce/BETA_LIMITATIONS.md.
More specifically: in 1.2 you need to create a firewall rule, service of type=nodeport (both of which you already seem to have), and a health check on that service at "/" (which you don't have, this requirement is alleviated in 1.3 but 1.3 is not out yet).
Also note that you can't put the same instance into 2 loadbalanced IGs, so to use the Ingress mentioned above you will have to cleanup your existing loadbalancer (or at least, remove the instances from the IG, and free up enough quota so the Ingress controller can do its thing).
There can be a few things wrong that are mentioned:
firewall rules need to be set to all hosts, are they need to have the same network label as the machines in the instance group have
by default, the node should return 200 at / - readiness and liveness probes to configure otherwise did not work for me
It seems you try to do things that are all automated, so I can really recommend:
https://cloud.google.com/kubernetes-engine/docs/how-to/load-balance-ingress
This shows the steps that do the firewall and portforwarding for you, which also may show you what you are missing.
I noticed myself when using an app on 8080, exposed on 80 (like one of the deployments in the example) that the load balancer staid unhealthy untill I had / returning 200 (and /healthz I added to). So basically that container now exposes a webserver on port 8080, returning that and the other config wires that up to port 80.
When it comes to firewall rules, make sure they are set to all machines or make the network label match, or they won't work.
The 502 error is usually from the loadbalancer that will not pass your request if the health check does not pass.
Could you make your service type LoadBalancer (http://kubernetes.io/docs/user-guide/services/#type-loadbalancer) which would setup this all up automatically? This assumes you have the flag set for google cloud.
After you deploy, then describe the service name and should give you the endpoint which is assigned.