Terraform - Cloudwatch alarm Elasticache cluster metrics - amazon-elasticache

Can someone suggest on how we can create alarms for Elasticache cluster for "CPUUtilization" and "FreeableMemory" using Terraform?
Elasticache seems like an exception where we are unable to get cluster level metrics. Seems like current workaround is to create alarms at node level.
Haven't tried below but seems like a workaround -
https://github.com/azavea/terraform-aws-redis-elasticache/blob/develop/main.tf

It is possible, and documented here.

Here is my terraform module to create cloudwatch alerts for rds and cache clusters at node level.
https://bitbucket.org/rkkrishnaa/terraform/src/master/
I have added Jenkinsfile to deploy the alerts through CI.

Related

In amazon eks - how to view logs which are prior to eks fargate node creation and logs while pods is starting up

I'm using amazon EKS fargate. I can see container logs using fluentbit side car etc no problem at all. But those logs ONLY show what is happening inside the container AFTER it has started up
I enabled aws eks cluster logging fully
Now I would like to see logs in cloudwatch which is equivalent of
kubectl describe pod
command
I have searched the ENTIRE cloudwatch clustername log group and am not able to find logs like
"pulling image into container"
"efs not mounted"
etc
I want to see logs in cloudwatch prior to the actual container creation stage
IS it possible at all using eks fargate ?
Thanks a bunch
You can use Container Insights which can collect metrics by using performance log events using the embedded metric format. The logs are stored in CloudWatch Logs. CloudWatch generates several metrics automatically from the logs which you can view in the CloudWatch console.
In Amazon EKS and Kubernetes, Container Insights uses a containerized version of the CloudWatch agent to discover all of the running containers in a cluster. It then collects performance data at every layer of the performance stack.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-view-metrics.html

EKS pods logging to Elastic Cloud

I am trying to set up pods logs shipping from EKS to ElasticSearch Cloud.
According to Fluent Bit for Amazon EKS on AWS Fargate is here, ElasticSearch should be supported:
You can choose between CloudWatch, Elasticsearch, Kinesis Firehose and Kinesis Streams as outputs.
According to FluentBit Configuration Parameters for ElasticSearch having Cloud_ID and Cloud_Auth parameters should be enough to ship logs to Elasticsearch Cloud.
An example here shows how to configure ES output for FluentBit, so my config looks like:
[OUTPUT]
Name es
Match *
Logstash_Format On
Logstash_Prefix ${logstash_prefix}
tls On
tls.verify Off
Pipeline date_to_timestamp
Cloud_ID ${es_cloud_id}
Cloud_Auth ${es_cloud_auth}
Trace_Output On
I am running a simple ngnix container to generate some logs (as in one of the linked examples), but they don't seem to appear in my ElasticSearch / Kibana.
Am I missing anything? How do I ship logs to ElasticSearch Cloud?
Also, Trace_Output On is supposed to log FluentBits' attempts to ship logs, but where can I see these logs on EKS?
I also ran into this. It seems to me only AWS ElasticSearch is supported when using the AWS managed FluentBit (from what I can tell).
https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-eks-adds-built-in-logging-support-for-aws-fargate/
You can work around this by using a sidecar fluentbit container (which can send to ElasticSearch) if that's an option for you. You will need to modify the application to have logs written to the filesystem.
Or you can use the managed FluentBit with the cloudwatch output, subscribe with to the log group with a lambda function and send it to ES.

Is high availability really not possible with aws emr instance fleets?

In the ams emr management guide (https://github.com/awsdocs/amazon-emr-management-guide/blob/master/doc_source/emr-instance-fleet.md) I read:
... the master instance fleet is only a single instance ...
Does this ultimatly mean, I cannot provision an EMR cluster with instances fleet as high available? Or am I missing something here?
Your understanding is correct. Although AWS introduced multiple master nodes for EMR for high availability in 2019, the same can't be said for fleet instances and the official documentation you mentioned above confirms it.

0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient cpu. MR3 Hive

I am trying to set up hive using mr3 on a kubernetes cluster hosted on AWS ec2. When I run the command run-hive.sh, Hive-server starts and the master-DAg is initialised but then it gets stuck on pending. When I describe the pod. This is the error message shows. I have kept the resources to minimum so it should not be that issue and I do not have any tainted nodes. If you know any alternative for running hive on Kubernetes with access to S3 or a better way to implement mr3 hive on Kubernetes cluster, please share.
One of the node description
Based on the topic i think the problem here is your cluster have not enough resources on your worker nodes, and a master node is tainted.
So the option here is either inreasing the resources on workers or taint the master node so You would be able to schedule pods there.
Control plane node isolation
By default, your cluster will not schedule pods on the control-plane node for security reasons. If you want to be able to schedule pods on the control-plane node, e.g. for a single-machine Kubernetes cluster for development, run:
kubectl taint nodes --all node-role.kubernetes.io/master-
This will remove the node-role.kubernetes.io/master taint from any nodes that have it, including the control-plane node, meaning that the scheduler will then be able to schedule pods everywhere

Can't deploy marketplace object on GKE

I have a running Kubernetes cluster on Google Cloud Platform.
I want to deploy a postgres image to my cluster.
When selecting the image and my cluster, I get the error:
insufficient OAuth scope
I have been reading about it for a few hours now and couldn't get it to work.
I managed to set the scope of the vm to allow APIs:
Cloud API access scopes
Allow full access to all Cloud APIs
But from the GKE cluster details, I see that everything is disabled except the stackdriver.
Why is it so difficult to deploy an image or to change the scope?
How can I modify the cluster permissions without deleting and recreating it?
Easiest way is to delete and recreate the cluster because there is no direct way to modify the scopes of a cluster. However, there is a workaround. Create a new node pool with the correct scopes and make sure to delete any of the old node pools. The cluster scopes will change to reflect the new node pool.
More details found on this post