Cloudwatch query dimension - amazon-cloudwatch

Let's assume I have custom namespace in Cloudwatch called nameSP with dimension PodID.
I collect number of connections from each pod. Lets assume that we have two pods so will get two Conn metrics. How can I get from Cloudwatch number of pods ?

You can use metric math to count the metrics, like this:
TIME_SERIES(METRIC_COUNT(SEARCH('{nameSP,PodID} MetricName="THE_NAME_OF_YOUR_METRIC_WITH_NUM_OF_CONNECTIONS"', 'Average', 300)))

Related

Prometheus query comparing different metrics with same set of labels

I'm trying to monitor if a queue in rabbitmq:
has messages
does not have a consumer
is not called .*_retry
If a queue matches all three, I want to create an alert.
The individual metrics are no problem to find but I cannot comprehend how I am going to AND the different metrics in one query, grouping them by a set of labels (ie. instance, queue).
Is this even possible?
I'm using the latest version of prometheus and scraping rabbitmq via it's built-in prometheus metrics plugin.
For example, if you have two metrics from different exporters:
probe_success => Blackbox exporter
node_memory_MemTotal_bytes => Node exporter
Suppose they have two common labels: "instance" and "group".
If you use the following query:
sum by (instance, group) (node_memory_MemTotal_bytes)>20000000000 and sum by (instance, group) (probe_success)==1
You'll get the instance+group with memory>20G and is UP.
See more info about logical operators in Prometheus documentation here.

How to make CodeDeploy Blue/Green create CloudWatch alarms for custom metrics?

I am using the CloudWatch agent to create metrics for disk usage, memory usage, cpu, and a couple other things. I would like to aggregate metrics based on the autoscaling group, using "AutoScalingGroupName":"${aws:AutoScalingGroupName}".
However, I'm using Blue/Green deployments with CodeDeploy, which creates a copy of the Autoscaling Group. The alarms I originally made for aggregations on the autoscaling groups are gone, and I can't put a widget in my Dashboard that shows avg cpu, memory, etc.
My quick solution was to use a custom append_dimension that is set to a hardcoded value, and aggregate dimensions on that. Is there an automated way that AWS provides that I don't know about?
I don't have experience of the above scenario using AWS console.
But, since I work on Terraform (infrastructure as code) mostly, you can use like this:
dimensions = {
AutoScalingGroupName = tolist(aws_codedeploy_deployment_group.autoScalingGroup.autoscaling_groups)[0]
}
Reason for converting it into list - the output of
aws_codedeploy_deployment_group.asg.autoscaling_groups
is a set value, which you can see when you output the value of the codedeployment group autoscaling group - it uses toset function. The metric dimensions for the CloudWatch metric alarm expects a string. So, the conversion of a set type (which is unordered) to list type is needed so that you can access the first element of the autoscaling group - which is the newly created copy of the autoscaling group by codedeploy.

Cloudwatch Alarm across all dimensions based on metric name for custom metrics

We are publishing custom Cloudwatch metrics from our service and want to set up alarms if the value for a metric name breaches a threshold for any dimension. Here are the metrics we are publishing:
Namespace=SameName, MetricName=Fault, Dimensions=[Operation=A, Program=ServiceName]
Namespace=SameName, MetricName=Fault, Dimensions=[Operation=B, Program=ServiceName]
Namespace=SameName, MetricName=Fault, Dimensions=[Operation=C, Program=ServiceName]
We want to set up an alarm so that a Fault across any dimension puts it into the Alarm state.
As you can see, the value for dimension Operation is different. Currently, we only have these 3 operations, so I understand we can use metric math to set up this alarm. But I am sure we will get to a point where this will keep growing.
I am able to use SEARCH expression + aggregate across search expression to generate a graph for it, but it does not let me create an alarm saying The expression for an alarm must include at least one metric.
Is there any other way I can achieve this?
Alarming directly on SEARCH is not supported yet. You would have to create a metric math expression where you list all 3 metrics, then create an expression that takes the max of the 3, like MAX(METRICS()). Make sure only the expression is marked as visible so that there is only 1 line on the graph.
As stated by Dejan, alarming on search isn't supported yet on Cloudwatch.
Another limitation is that you can only add up to 10 metrics to a metric math expression, which you can overcome with the new composite alarms.
If you would consider using a 3rd party service, you can try DataDog.
With DataDog you can import your cloudwatch metrics and set up multi-alarms which follow (and automatically discover) all tags under a specific metric.
There might be other services that offer this kind of feature, but I specifically have experience with this tool.

Publishing table count stats with cloudwatch put-metric-data

I've been tasked with monitoring a data integration task, and I'm trying to figure out the best way to do this using cloudwatch metrics.
The data integration task populates records in 3 database tables. What I'd like to do is publish custom metrics each day, with the number of rows that have been inserted for each table. If the row count for one or more tables is 0, then it means something has gone wrong with the integration scripts, so we need to send alerts.
My question is, how to most logically structure the calls to put-metric-data.
I'm thinking of the data being structured something like this...
Namespace: Integrations/IntegrationProject1
Metric Name: RowCount
Metric Dimensions: "Table1", "Table2", "Table3"
Metric Values: 10, 100, 50
Does this make sense, or should it logically be structured in some other way? There is no inherent relationship between the tables, other than that they're all associated with a particular project. What I mean is, I don't want to be infering some kind of meaningful progression from 10 -> 100 -> 50.
Is this something that can be done with a single call to the cloudwatch put-metric-data, or would it need to be 3 seperate calls?
Seperate calls I think would look something like this...
aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 10 --dimensions Table=Table1
aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 100 --dimensions Table=Table2
aws cloudwatch put-metric-data --metric-name RowCount --namespace "Integrations/IntegrationProject1" --unit Count --value 50 --dimensions Table=Table3
This seems like it should work, but is there some more efficient way I can do this, and combine it into a single call?
Also is there a way I can qualify that the data has a resolution of only 24 hours?
Your structure looks fine to me. Consider having a dimension for your stage: beta|gamma|prod.
This seems like it should work, but is there some more efficient way I can do this, and combine it into a single call?
Not using the AWS CLI, but if you used any SDK e.g. Python Boto3, you can publish up to 20 metrics in a single PutMetricData call.
Also is there a way I can qualify that the data has a resolution of only 24 hours?
No. CloudWatch will aggregate the data it receives on your behalf. If you want to see a daily datapoint, you can change the period to 1 day when graphing the metric on the CloudWatch Console.

How can I divide 2 series in Grafana with CloudWatch?

I have 2 series in Grafana coming from CloudWatch (sum of 200s and sum of 400s). I would like to divide one of them by the other one but the function divideSeries is not working.
Cloudwatch does not support operations on multiple series.