Fluentbit Cloudwatch templating with EKS and Fargate - amazon-cloudwatch

I've an EKS cluster purely on Fargate and I'm trying to setup the logging to Cloudwatch.
I've a lot of [OUTPUT] sections that can be unified using some variables. I'd like to unify the logs of each deployment to a single log_stream and separate the log_stream by environment (name_space). Using a couple of variable I'd need just to write a single [OUTPUT] section.
For what I understand the new Fluentbit plugin: cloudwatch_logs doesn't support templating, but the old plugin cloudwatch does.
I've tried to setup a section like in the documentation example:
[OUTPUT]
Name cloudwatch
Match *container_name*
region us-east-1
log_group_name /eks/$(kubernetes['namespace_name'])
log_stream_name test_stream
auto_create_group on
This generates a log_group called fluentbit-default that according to the README.md is the fallback name in case the variables are not parsed.
The old plugin cloudwatch is supported (but not mentioned in AWS documentation) because if I replace the variable $(kubernetes['namespace_name']) with any string it works perfectly.
Fluentbit in Fargate manages automatically the INPUT section so I don't really know which variables are sent to the OUTPUT section, I suppose the variable kubernetes is not there or it has a different name or a different array structure.
So my questions are:
Is there a way to get the list of the variables (or input) that Fargate + Fluentbit are generating?
Get I solve that in a different way? (I don't want to write more than 30 different OUTPUT one for each service/log_stream_name. It would be also difficult to maintain it)
Thanks!

After few days of tests, I've realised that you need to enable the kubernetes filter to receive the kubernetes variables to the cloudwatch plugin.
This is the result, and now I can generate log_group depending on the environment label and log_stream depending of the container-namespace names.
filters.conf: |
[FILTER]
Name kubernetes
Match *
Merge_Log Off
Buffer_Size 0
Kube_Meta_Cache_TTL 300s
output.conf: |
[OUTPUT]
Name cloudwatch
Match *
region eu-west-2
log_group_name /aws/eks/cluster/$(kubernetes['labels']['app.environment'])
log_stream_name $(kubernetes['namespace_name'])-$(kubernetes['container_name'])
default_log_group_name /aws/eks/cluster/others
auto_create_group true
log_key log
Please note that the app.environment is not a "standard" value, I've added it to all my deployments. The default_log_group_name is necessary in case that value is not present.
Please note also that if you use log_retention_days and new_log_group_tags the system is not going to work. To be honest log_retention_days it never worked for me also using the new cloudwatch_logs plugin either.

Related

how to concatenate the OutputPathPlaceholder with a string with Kubeflow pipelines?

I am using Kubeflow pipelines (KFP) with GCP Vertex AI pipelines. I am using kfp==1.8.5 (kfp SDK) and google-cloud-pipeline-components==0.1.7. Not sure if I can find which version of Kubeflow is used on GCP.
I am bulding a component (yaml) using python inspired form this Github issue. I am defining an output like:
outputs=[(OutputSpec(name='drt_model', type='Model'))]
This will be a base output directory to store few artifacts on Cloud Storage like model checkpoints and model.
I would to keep one base output directory but add sub directories depending of the artifact:
<output_dir_base>/model
<output_dir_base>/checkpoints
<output_dir_base>/tensorboard
but I didn't find how to concatenate the OutputPathPlaceholder('drt_model') with a string like '/model'.
How can append extra folder structure like /model or /tensorboard to the OutputPathPlaceholder that KFP will set during run time ?
I didn't realized in the first place that ConcatPlaceholder accept both Artifact and string. This is exactly what I wanted to achieve:
ConcatPlaceholder([OutputPathPlaceholder('drt_model'), '/model'])

How to add a fargate profile to an existing cluster with CDK

I want to add a new fargate profile to an existing eks cluster.
The cluster is created in another Stack and in my tenant specific stack I am importing my eks cluster via attributes.
self.cluster: Cluster = Cluster.from_cluster_attributes(
self, 'cluster', cluster_name=cluster,
open_id_connect_provider=eks_open_id_connect_provider,
kubectl_role_arn=kubectl_role
)
The error is:
Object of type #aws-cdk/core.Resource is not convertible to #aws-cdk/aws-eks.Cluster
and it is appearing on this line here
FargateProfile(self, f"tenant-{self.tenant}", cluster=self.cluster, selectors=[Selector(namespace=self.tenant)])
If I try calling
self.cluster.add_fargate_profile(f"tenant-{self.tenant}", selectors=[Selector(namespace=self.tenant)])
I get the error that the object self.cluster does not have the attribute add_fargate_profile
While you might think that something is of with importing the cluster, adding manifests and helm charts work just fine.
self.cluster.add_manifest(...) <-- this is working
This is not currently possible in CDK.
As per the docs, eks.Cluster.fromClusterAttributes returns an ICluster, while FargateProfile expects a Cluster explicitly.
A FargateCluster can only currently be created in CDK, not imported.

Vault Telemetry to CloudWatch

I'm trying to get Vault telemetry streamed through Cloudwatch Agent's StatsD interface into CW metrics, however, the gauge metric values are coming through with prefixes based on the instance ID and tags that are making the metrics impossible to target for IaC managed Cloudwatch alarms.
For instance, the vault.core.unsealed telemetry event is coming through as vault_CLOUDWATCH_AGENT_HOSTNAME_core_unsealed_INSTANCE_NAME instead of the vault_core_unsealed that I was expecting.
Managing the alarms for these metrics using Terraform is impossible because they will have dynamic names and based on whichever instance is determined as the current leader in the cluster which we have no control over.
In the Vault configuration HCL file, I have:
telemetry {
statsd_address = "127.0.0.1:8125"
disable_hostname = true
enable_hostname_label = true
}
along with several other combinations of hostname configuration values and they all seem to produce the same output. Is there a solution to this that I'm missing or just a flaw in deciding to use Cloudwatch with StatsD to capture telemetry?
Seemed to have gotten the gauge value names to a usable point with a few non-obvious configuration changes.
In the Vault telemetry stanza, only add the disable_hostname = true property with the StatsD address. Adding the labels in addition will simply move the hostname to a different position in the metric name.
The Cloudwatch agent configuration has an option to omit hostnames which can be toggles by appending of setting a new configuration:
{
"agent": {
"omit_hostname": true
}
}
This will prevent the CloudWatch agent from adding its own labels and suffixes to the gauge metric names and clean up some of the naming that is produced
(Optional) Adjust the appended dimensions in the CloudWatch agent configuration. By default, the agent will append the instance ID, image ID, autoscaling group name, and instance type. This may be something you want to keep, however, if you want to do something like IaC created metric alarms, you may need to remove some dimensions to make the metric names targetable (able to be found via direct match). The following can be added to the custom config that will replace the default CloudWatch agent configuration if you want to adjust which dimensions are automatically appended to the incoming telemetry.
{
"metrics": {
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}"
}
}
}
As long as you know the name of the autoscaling group that the instances are targeted under, the gauge metric names coming in from the Vault telemetry will be named ambiguously enough to target them for IaC purposes.

Where to put common variables for groups in Ansible

We have some scripts to help us set up VPCs with up to 6 VMs in AWS. Now I want to log in to each of these machines. For security reasons we can only access one of them via SSH and then tunnel/proxy through that to the other machines. So in our inventory we have the IP address of the SSH host (we call it Redcarpet) and some other hosts like Elasticsearch, Mongodb and Worker:
#inventory/hosts
[redcarpet]
57.44.113.25
[services]
10.0.1.2
[worker]
10.0.5.77
10.0.1.200
[elasticsearch]
10.0.5.30
[mongodb]
10.0.1.5
Now I need to tell each of the groups, EXCEPT redcarpet to use certain SSH settings. If these were applicable to all groups, I would put them in inventory/group_vars/all.yml, but now I will have to put them in:
inventory/group_vars/services.yml
inventory/group_vars/worker.yml
inventory/group_vars/elasticsearch.yml
inventory/group_vars/mongodb.yml
Which leads to duplication. Therefore I would like to use an include or include_vars to include one or two variables from a common file (e.g. inventory/common.yml). However, when I try to do this in any of the group_var files above, it does not pick up the variables. What is the best practice to use with variables that are common to multiple groups?
If you want to go with the group_vars approach, I would suggest you add another group, and add the dependent groups as children to that group.
#inventory/hosts
[redcarpet]
57.44.113.25
[services]
10.0.1.2
[worker]
10.0.5.77
10.0.1.200
[elasticsearch]
10.0.5.30
[mongodb]
10.0.1.5
[redcarpet_deps:children]
mongodb
elasticsearch
worker
services
And now you can have a group_vars file called redcarpet_deps.yml and they should pickup the vars from there.

How to give input Mapreduce jobs which use s3 data from java code

I know that we can normally give the parameters while running the jar file in EC2 instance
But how do we give inputs through code?
I am trying this because I am trying to call my java code from a jsp, so in the java code ,I want to directly pick up data from s3 and proceed , I tries like this but in vain:
DataExtractor.getRelevantData("s3n://syamk/revanthinput/", "999999", "94645", "20120606",
"s3n://revanthufl/gen/testoutput" + "interm");
here s3n://syamk/revanthinput/ I was using input and instead of s3n://revanthufl/gen/testoutput.
I was using output and in the parameters I am using the same strings(s3n://syamk/revanthinput/ and s3n://revanthufl/gen/testoutput) to run the jar.But doing like this from code is throwing and exception,
[java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).] with root cause
Based on my usage of flume, it would appear that you need to format your URL like s3n://AWS_ACCESS_KEY:AWS_SECRET_KEY#syamk/revanthinput/ when calling s3 from within code.