Vault Telemetry to CloudWatch - amazon-cloudwatch

I'm trying to get Vault telemetry streamed through Cloudwatch Agent's StatsD interface into CW metrics, however, the gauge metric values are coming through with prefixes based on the instance ID and tags that are making the metrics impossible to target for IaC managed Cloudwatch alarms.
For instance, the vault.core.unsealed telemetry event is coming through as vault_CLOUDWATCH_AGENT_HOSTNAME_core_unsealed_INSTANCE_NAME instead of the vault_core_unsealed that I was expecting.
Managing the alarms for these metrics using Terraform is impossible because they will have dynamic names and based on whichever instance is determined as the current leader in the cluster which we have no control over.
In the Vault configuration HCL file, I have:
telemetry {
statsd_address = "127.0.0.1:8125"
disable_hostname = true
enable_hostname_label = true
}
along with several other combinations of hostname configuration values and they all seem to produce the same output. Is there a solution to this that I'm missing or just a flaw in deciding to use Cloudwatch with StatsD to capture telemetry?

Seemed to have gotten the gauge value names to a usable point with a few non-obvious configuration changes.
In the Vault telemetry stanza, only add the disable_hostname = true property with the StatsD address. Adding the labels in addition will simply move the hostname to a different position in the metric name.
The Cloudwatch agent configuration has an option to omit hostnames which can be toggles by appending of setting a new configuration:
{
"agent": {
"omit_hostname": true
}
}
This will prevent the CloudWatch agent from adding its own labels and suffixes to the gauge metric names and clean up some of the naming that is produced
(Optional) Adjust the appended dimensions in the CloudWatch agent configuration. By default, the agent will append the instance ID, image ID, autoscaling group name, and instance type. This may be something you want to keep, however, if you want to do something like IaC created metric alarms, you may need to remove some dimensions to make the metric names targetable (able to be found via direct match). The following can be added to the custom config that will replace the default CloudWatch agent configuration if you want to adjust which dimensions are automatically appended to the incoming telemetry.
{
"metrics": {
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}"
}
}
}
As long as you know the name of the autoscaling group that the instances are targeted under, the gauge metric names coming in from the Vault telemetry will be named ambiguously enough to target them for IaC purposes.

Related

error creating Application AutoScaling Target: ValidationException: Unsupported service namespace, resource type or scalable dimension

I'm trying to enable ECS autoscaling for some Fargate services and run into the error in the title:
error creating Application AutoScaling Target: ValidationException: Unsupported service namespace, resource type or scalable dimension
The error happens on line 4 here:
resource "aws_appautoscaling_target" "autoscaling" {
max_capacity = var.max_capacity
min_capacity = 1
resource_id = var.resource_id
// <snip... a bunch of other vars not relevant to question>
I call the custom autoscaling module like so:
module "myservice_autoscaling" {
source = "../autoscaling"
resource_id = aws_ecs_service.myservice_worker.id
// <snip... a bunch of other vars not relevant to question>
My service is a normal ECS service block starting with:
resource "aws_ecs_service" "myservice_worker" {
After poking around online, I thought maybe I should construct the "service/clusterName/serviceName" sort of "manually", like so:
resource_id = "service/${var.cluster_name}/${aws_ecs_service.myservice_worker.name}"
But that leads to a different error:
The argument "cluster_name" is required, but no definition was found.
I created cluster_name in my calling module (i.e. myservice ECS stuff that calls my new autoscaling module) variables.tf. And I have cluster_name in the outputs.tf of our cluster module where we're setting up the ECS cluster. I must be missing some linking still.
Any ideas? Thanks!
Edit: here's the solution that got it working for me
Yes, you do need to construct the resource_id in the form of "service/yourClusterName/yourServiceName". Mine ended up looking like: "service/${var.cluster_name}/${aws_ecs_service.myservice_worker.name}"
You need to make sure you have access to the cluster name and service name variables. In my case, though I had the variable defined in my ECS service's variables.tf, and I added it my cluster module's outputs.tf, I was failing to pass down from the root module to the service module. This fixed that:
module "myservice" {
source = "./modules/myservice"
cluster_name = module.cluster.cluster_name // the line I added
(the preceding snippet goes in the main.tf of your root module (a level above your service module)
You are on the right track constructing the "service/${var.cluster_name}/${aws_ecs_service.myservice_worker.name}" string. It looks like you simply aren't referencing the cluster name correctly.
And I have cluster_name in the outputs.tf of our cluster module
So you need to reference that module output, instead of referencing a not-existent variable:
"service/${module.my_cluster_module.cluster_name}/${aws_ecs_service.myservice_worker.name}"
Change "my_cluster_module" to whatever name you gave the module that is creating your ECS cluster.

Fluentbit Cloudwatch templating with EKS and Fargate

I've an EKS cluster purely on Fargate and I'm trying to setup the logging to Cloudwatch.
I've a lot of [OUTPUT] sections that can be unified using some variables. I'd like to unify the logs of each deployment to a single log_stream and separate the log_stream by environment (name_space). Using a couple of variable I'd need just to write a single [OUTPUT] section.
For what I understand the new Fluentbit plugin: cloudwatch_logs doesn't support templating, but the old plugin cloudwatch does.
I've tried to setup a section like in the documentation example:
[OUTPUT]
Name cloudwatch
Match *container_name*
region us-east-1
log_group_name /eks/$(kubernetes['namespace_name'])
log_stream_name test_stream
auto_create_group on
This generates a log_group called fluentbit-default that according to the README.md is the fallback name in case the variables are not parsed.
The old plugin cloudwatch is supported (but not mentioned in AWS documentation) because if I replace the variable $(kubernetes['namespace_name']) with any string it works perfectly.
Fluentbit in Fargate manages automatically the INPUT section so I don't really know which variables are sent to the OUTPUT section, I suppose the variable kubernetes is not there or it has a different name or a different array structure.
So my questions are:
Is there a way to get the list of the variables (or input) that Fargate + Fluentbit are generating?
Get I solve that in a different way? (I don't want to write more than 30 different OUTPUT one for each service/log_stream_name. It would be also difficult to maintain it)
Thanks!
After few days of tests, I've realised that you need to enable the kubernetes filter to receive the kubernetes variables to the cloudwatch plugin.
This is the result, and now I can generate log_group depending on the environment label and log_stream depending of the container-namespace names.
filters.conf: |
[FILTER]
Name kubernetes
Match *
Merge_Log Off
Buffer_Size 0
Kube_Meta_Cache_TTL 300s
output.conf: |
[OUTPUT]
Name cloudwatch
Match *
region eu-west-2
log_group_name /aws/eks/cluster/$(kubernetes['labels']['app.environment'])
log_stream_name $(kubernetes['namespace_name'])-$(kubernetes['container_name'])
default_log_group_name /aws/eks/cluster/others
auto_create_group true
log_key log
Please note that the app.environment is not a "standard" value, I've added it to all my deployments. The default_log_group_name is necessary in case that value is not present.
Please note also that if you use log_retention_days and new_log_group_tags the system is not going to work. To be honest log_retention_days it never worked for me also using the new cloudwatch_logs plugin either.

How can I configure a specific serialization method to use only for Celery ping?

I have a celery app which has to be pinged by another app. This other app uses json to serialize celery task parameters, but my app has a custom serialization protocol. When the other app tries to ping my app (app.control.ping), it throws the following error:
"Celery ping failed: Refusing to deserialize untrusted content of type application/x-stjson (application/x-stjson)"
My whole codebase relies on this custom encoding, so I was wondering if there is a way to configure a json serialization but only for this ping, and to continue using the custom encoding for the other tasks.
These are the relevant celery settings:
accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_accept_content = [CUSTOM_CELERY_SERIALIZATION, "json"]
result_serializer = CUSTOM_CELERY_SERIALIZATION
task_serializer = CUSTOM_CELERY_SERIALIZATION
event_serializer = CUSTOM_CELERY_SERIALIZATION
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Specs: celery=5.1.2
python: 3.8
OS: Linux docker container
Any help would be much appreciated.
Changing any of the last 3 to [CUSTOM_CELERY_SERIALIZATION, "json"] causes the app to crash, so that's not an option.
Because result_serializer, task_serializer, and event_serializer doesn't accept list but just a single str value, unlike e.g. accept_content
The list for e.g. accept_content is possible because if there are 2 items, we can check if the type of an incoming request is one of the 2 items. It isn't possible for e.g. result_serializer because if there were 2 items, then what should be chosen for the result of task-A? (thus the need for a single value)
This means that if you set result_serializer = 'json', this will have a global effect where all the results of all tasks (the returned value of the tasks which can be retrieved by calling e.g. response.get()) would be serialized/deserialized using the json-serializer. Thus, it might work for the ping but it might not for the tasks that can't be directly serialized/deserialized to/from JSON which really needs the custom stjson-serializer.
Currently with Celery==5.1.2, it seems that task-specific setting of result_serializer isn't possible, thus we can't set a single task to be encoded in 'json' and not 'stjson' without setting it globally for all, I assume the same case applies to ping.
Open request to add result_serializer option for tasks
A short discussion in another question
Not the best solution but a workaround is that instead of fixing it in your app's side, you may opt to just add support to serialize/deserialize the contents of type 'application/x-stjson' in the other app.
other_app/celery.py
import ast
from celery import Celery
from kombu.serialization import register
# This is just a possible implementation. Replace with the actual serializer/deserializer for stjson in your app.
def stjson_encoder(obj):
return str(obj)
def stjson_decoder(obj):
obj = ast.literal_eval(obj)
return obj
register(
'stjson',
stjson_encoder,
stjson_decoder,
content_type='application/x-stjson',
content_encoding='utf-8',
)
app = Celery('other_app')
app.conf.update(
accept_content=['json', 'stjson'],
)
You app remains to accept and respond stjson format, but now the other app is configured to be able to parse such format.

Building Terraform Custom Variables when using Count for AWS Lambda Functions

I am trying to generate a variable for a Lambda Function that is based on the setting from an API Gateway created at the same time using Terraform. I am using trimprefix and trimsuffix to modify the setting I get from the api gateway, which I then set as an Environment Variable to be used by the Lambda Function Code.
I had this working initially using output statements as I was originally using modules. I have since decided to move away from modules to simplify the code. However my real issue is how do I perform the trimprefix and trimsuffix actions when I am also using the count feature.
Here is my original code when I was still using modules, and it successfully created the final "invoke_url" after trimming "https://" from the beginning, and "/default" from the end.
## Obtain the rest_api_id
output "rest_api_id" {
value = aws_api_gateway_deployment.retaildiscount[count.index].rest_api_id
}
## Trim the https prefix from the invoke URL and store in var.invoke_url_tmp
output "invoke_url_tmp" {
value = trimprefix(aws_api_gateway_deployment.retaildiscount[count.index].invoke_url, "https://")
}
## Trim the /default suffix from var.invoke_url_tmp and output as var.invoke_url to be used
## by the retailorderprice function
output "invoke_url" {
value = trimsuffix(var.invoke_url_tmp, "/default")
}
I am now trying to do the same, but when using "count" to create multiple copies of the same Lambda Functions and API Gateways (this is to create multiple instances for a lab style workshop, each Function will have a unique name pulled from a aito.tfvars file
For the life of me I cannot work out how to generate the modified variable and link it back to the appropriate function

Akka.NET - Cluster and ActorSelection path

I have an akka.net cluster and I want to send a message to actors that are both local and remote, and that all have the path "/user/foobar" (at least locally). Should I use ActorSelection, and what should the path look like in order to target both matching local and remote actors?
It's unclear from the question whether you mean you want to send a message locally within one node in your cluster, or across multiple nodes.
If you just want to send it in one node, you can use an ActorSelection and just send it to whatever the desired actor path is (e.g. /user/*/processingActor). If you want to message across the cluster itself, you'll need to set up a cluster-aware Group router.
See the docs here for router configuration, which is where you'll define the routees.
In a nutshell, you'll be doing something like this:
# inside akka.actor.deployment HOCON
/some-group-router {
router = round-robin-group
routees.paths = ["/user/*/processingActor",]
nr-of-instances=3
cluster {
enabled=on
use-role=targetRoleName
allow-local-routees=on
}
}