Define a condition in cloudformation template with alarms - conditional-statements

How to define/declare a condition to create an alarm in prod?
With the condition:Isprod would work to create an alarm in prod?
WOULD this work? how to define a condition below?
LambdaInvocationsAlarm:
Condition: IsProd
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Lambda invocations
AlarmName: LambdaInvocationsAlarm
ComparisonOperator: LessThanLowerOrGreaterThanUpperThreshold
EvaluationPeriods: 1
Metrics:
- Expression: ANOMALY_DETECTION_BAND(m1, 2)
Id: ad1
- Id: m1
MetricStat:
Metric:
MetricName: Invocations
Namespace: AWS/Lambda
Period: !!int 86400
Stat: Sum
ThresholdMetricId: ad1
TreatMissingData: breaching

As #Marcin said, you should explain what you have tried and what is blocking more precisely.
But what you suggest could work yes: you can define a Condition named isProd and use it to create - or not - resources. Regarding this condition: AWS does not know what is a production stage in your environment, so you need to specify that. Does your production stage matches an account? Does it match a region? Something else?
As an example and if we assume that your production stage matches a specific AWS account, then you could define the condition as below (it's JSON, feel free to convert to YAML):
{
"Parameters": {
"ProdAccountParameter": {
"Type": "String",
"Description": "Enter the production account identifier."
}
},
"Conditions": {
"isProd": {
"Fn::Equals": [
{
"Ref": "ProdAccountParameter"
},
{
"Ref": "AWS::AccountId"
}
]
}
},
...
}
(Then, when deploying the template, you'll need to provide your AWS production account).

Related

OPA authorization policies with scopes and roles

I'm using Open Policy Agent as an authorization component together with OIDC enabled apps.
I have input from the apps in the format:
{
"token": {
"scopes": [
"read:books",
"write:books"
]
},
"principal": {
"roles": [
"user",
"moderator"
]
},
"context": {
"action": "read",
"resource": "books"
}
}
Then I have data with access mapping in the format:
{
"user": [
"read:books"
],
"moderator": [
"read:books",
"write:books"
],
"administrator": [
"read:books",
"write:books",
"read:store",
"write:store"
]
}
And the policy currently looks like this:
package whatever.authz
context_scope := concat(":", [input.context.action, input.context.resource])
default allow = false
allow {
token_has_context_scope
principal_has_resource_access
}
token_has_context_scope {
context_scope == input.token.scopes[_]
}
principal_has_resource_access {
principal_role := input.principal.roles[_]
context_scope == data[principal_role][_]
}
This produces the following error:
2 errors occurred:
policy.rego:16: rego_recursion_error: rule principal_has_resource_access is recursive: principal_has_resource_access -> principal_has_resource_access
policy.rego:7: rego_recursion_error: rule allow is recursive: allow -> principal_has_resource_access -> allow
It is the recursive lookup in the principal_has_resource_access function that is causing the error.
I need to check if one of the roles of the principal is allowed to access the resource as specified by the context. Since roles is an array i need to find the union of all access scopes in the data and see if one of them matches the context scope. What am I doing wrong in the policy?
The snippet can be found in the Rego Playground https://play.openpolicyagent.org/p/KhovLRgMup
OPA stores all data under the data path, including policy and rules. There's no way for the compiler to know that the input you're providing isn't referencing the policy itself (i.e. data["whatever"]) which would be recursive. The easiest way to work around this is to simply use a top level attribute for your data which differs from your policy (i.e package name), like this:
{
"attributes": {
"user": [
"read:books"
],
"moderator": [
"read:books",
"write:books"
],
"administrator": [
"read:books",
"write:books",
"read:store",
"write:store"
]
}
}
And update your policy to reference this:
context_scope == data["attributes"][principal_role][_]
Since data.attributes != data.whatever.authz there is no risk of recursion, and the compiler won't complain. You might want a better name than "attributes", but I'll leave that to you :)

How to use input transformer for ECS Fargate launch type with Terraform CloudWatch event trigger

I'm using terraform to create a CloudWatch Event Trigger with a ECS Fargate launch type which the event source is S3. When I use the input_transformer field to pass in the bucket and key into the ECS task, my event rule results in a failed invocation.
This is the aws_cloudwatch_event_rule:
resource "aws_cloudwatch_event_rule" "event_rule" {
name = "dev-gnss-source-put-rule-tf"
description = "Capture S3 events on uploads bucket"
event_pattern = <<PATTERN
{
"source": [
"aws.s3"
],
"detail-type": [
"AWS API Call via CloudTrail"
],
"detail": {
"eventSource": [
"s3.amazonaws.com"
],
"eventName": [
"PutObject"
],
"requestParameters": {
"bucketName": [
"example-bucket-name"
]
}
}
}
PATTERN
}
This is the aws_cloudwatch_event_target:
resource "aws_cloudwatch_event_target" "event_target" {
target_id = "dev-gnss-upload-event-target-tf"
arn = "example-cluster-arn"
rule = aws_cloudwatch_event_rule.event_rule.name
role_arn = aws_iam_role.uploads_events.arn
ecs_target {
launch_type = "FARGATE"
task_count = 1 # Launch one container / event
task_definition_arn = "example-task-definition-arn"
network_configuration {
subnets = ["example-subnet"]
security_groups = []
}
}
input_transformer {
input_paths = {
s3_bucket = "$.detail.requestParameters.bucketName"
s3_key = "$.detail.requestParameters.key"
}
input_template = <<TEMPLATE
{
"containerOverrides": [
{
"name": "myproject-task",
"environment": [
{ "name": "S3_BUCKET", "value": <s3_bucket> },
{ "name": "S3_KEY", "value": <s3_key> }
]
}
]
}
TEMPLATE
}
}
If I remove the input_transformer section, it will work fine, but I need to pass in the s3 bucket and key to process the particular file.
My rationale for doing this is to remove the need for an intermediary Lambda and was guided by this Medium post: https://medium.com/#bowbaq/trigger-an-ecs-job-when-an-s3-upload-completes-3559c44c37d1
Any advice is appreciated.
After hours of going in circles, I found an answer!
So the first step is to check what the cause of the failed invocation is. You can do this by checking CloudTrail logs by navigating to Cloud Trail > Event history > Search by Event name and type RunTask in the search box. You should see a series of events from the event source ecs.amazonaws.com. Find one that relates to your the Failed Invocation you experienced.
When you click into the event, you can see under the Event record section an errorMessage. In my case, it was the following:
"errorCode": "InvalidParameterException",
"errorMessage": "Override for container named myproject-task is not a container in the TaskDefinition.",
This may be different for you. For me, it was because my containerOverride name was incorrect. This field refers to: The name of the container that receives the override. This parameter is required if any override is specified. ref: https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerOverride.html
Correcting this field fixed my issue.

Fargate environment variable redis.yaml

I have a microservice and I need to pass in a file redis.yaml to configure Elasticache for Redis.
Assume I have a file called redis.yaml with contents:
clusterServersConfig:
idleConnectionTimeout: 10000
pingTimeout: 1000
connectTimeout: 10000
timeout: 60000
retryAttempts: 3
retryInterval: 60000
And my application.properties I use:
redis.config.location=file:/opt/usr/conf/redis.yaml
In Kubernetes, I can just create a secret with --from-file redis.yaml and the application runs properly.
I do not know how to do the same with AWS Fargate. I believe it could be done with AWS SSM but any help/steps on how to do it would be appreciated.
For externalized configuration, Fargate supports environment variables. Environment variables can be passed in Task definition.
"environment": [
{ "name": "env_name1", "value": "value1" },
{ "name": "env_name2", "value": "value2" }
]
If it's sensitive information, store it in AWS SSM-Parameter store (you can use KMS) and specify the parameter key in the task definition.
{
"containerDefinitions": [{
"secrets": [{
"name": "environment_variable_name",
"valueFrom": "arn:aws:ssm:region:aws_account_id:parameter/parameter_name"
}]
}]
}
In your case, you can convert your yaml to JSON and store it in the Parameter store and refer it in the task definition.
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/specifying-sensitive-data.html

Can I customize partitioning in Kinesis Firehose before delivering to S3?

I have a Firehose stream that is intended to ingest millions of events from different sources and of different event-types. The stream should deliver all data to one S3 bucket as a store of raw\unaltered data.
I was thinking of partitioning this data in S3 based on metadata embedded within the event message like event-souce, event-type and event-date.
However, Firehose follows its default partitioning based on record arrival time. Is it possible to customize this partitioning behavior to fit my needs?
Update: Accepted answer updated as a new answer suggests the feature is available as of Sep 2021
As of writing this, the dynamic partitioning feature Vlad has mentioned is still pretty new. I needed it to be a part of CloudFormation template, which was still not properly documented. I had to add in DynamicPartitioningConfiguration to get it working properly. MetadataExtractionQuery syntax was also not properly documented.
MyKinesisFirehoseStream:
Type: AWS::KinesisFirehose::DeliveryStream
...
Properties:
ExtendedS3DestinationConfiguration:
Prefix: "clients/client_id=!{client_id}/dt=!{timestamp:yyyy-MM-dd}/"
ErrorOutputPrefix: "errors/!{firehose:error-output-type}/"
DynamicPartitioningConfiguration:
Enabled: "true"
RetryOptions:
DurationInSeconds: "300"
ProcessingConfiguration:
Enabled: "true"
Processors:
- Type: AppendDelimiterToRecord
- Type: MetadataExtraction
Parameters:
- ParameterName: MetadataExtractionQuery
ParameterValue: "{client_id:.client_id}"
- ParameterName: JsonParsingEngine
ParameterValue: JQ-1.6
Since September 1st, 2021, AWS Kinesis Firehose supports this feature. Read the announcement blog post here.
From the documentation:
You can use the Key and Value fields to specify the data record parameters to be used as dynamic partitioning keys and jq queries to generate dynamic partitioning key values. ...
Here is how it looks like from UI:
No. You cannot 'partition' based upon event content.
Some options are:
Send to separate Firehose streams
Send to a Kinesis Data Stream (instead of Firehose) and write your own custom Lambda function to process and save the data (See: AWS Developer Forums: Athena and Kinesis Firehose)
Use Kinesis Analytics to process the message and 'direct' it to different Firehose streams
If you are going to use the output with Amazon Athena or Amazon EMR, you could also consider converting it into Parquet format, which has much better performance. This would require post-processing of the data in S3 as a batch rather than converting the data as it arrives in a stream.
To build on John's answer, if you don't have the near real-time streaming requirements, we've found batch-processing with Athena to be a simple solution for us.
Kinesis streams to a given table unpartitioned_event_data, which can make use of the native record arrival time partitioning.
We define another Athena table partitioned_event_table which can be defined with custom partition keys and make use of the INSERT INTO capabilities that Athena has. Athena will automatically repartition your data in the format you want without requiring any custom consumers or new infrastructure to manage. This can be scheduled with a cron, SNS, or something like Airflow.
What's cool is you can create a view that does a UNION of the two tables to query historical and real-time data in one place.
We actually dealt with this problem at Radar and talk about more trade-offs in this blog post.
To expand on Murali's answer, we have implemented it in CDK:
Our incomming json data looks something like this:
{
"data":
{
"timestamp":1633521266990,
"defaultTopic":"Topic",
"data":
{
"OUT1":"Inactive",
"Current_mA":3.92
}
}
}
The CDK code looks as following:
const DeliveryStream = new CfnDeliveryStream(this, 'deliverystream', {
deliveryStreamName: 'deliverystream',
extendedS3DestinationConfiguration: {
cloudWatchLoggingOptions: {
enabled: true,
},
bucketArn: Bucket.bucketArn,
roleArn: deliveryStreamRole.roleArn,
prefix: 'defaultTopic=!{partitionKeyFromQuery:defaultTopic}/!{timestamp:yyyy/MM/dd}/',
errorOutputPrefix: 'error/!{firehose:error-output-type}/',
bufferingHints: {
intervalInSeconds: 60,
},
dynamicPartitioningConfiguration: {
enabled: true,
},
processingConfiguration: {
enabled: true,
processors: [
{
type: 'MetadataExtraction',
parameters: [
{
parameterName: 'MetadataExtractionQuery',
parameterValue: '{Topic: .data.defaultTopic}',
},
{
parameterName: 'JsonParsingEngine',
parameterValue: 'JQ-1.6',
},
],
},
{
type: 'AppendDelimiterToRecord',
parameters: [
{
parameterName: 'Delimiter',
parameterValue: '\\n',
},
],
},
],
},
},
})
My scenario is:
Firehose needs to send data to s3, which is tied to glue table, parquet as format, and dynamic partitioning enabled since I want to consider the year, month, and day from the data I push to firehose instead of the default.
Below is the working code
rawdataFirehose:
Type: AWS::KinesisFirehose::DeliveryStream
Properties:
DeliveryStreamName: !Join ["-", [rawdata, !Ref AWS::StackName]]
DeliveryStreamType: DirectPut
ExtendedS3DestinationConfiguration:
BucketARN: !GetAtt rawdataS3bucket.Arn
Prefix: parquetdata/year=!{partitionKeyFromQuery:year}/month=!{partitionKeyFromQuery:month}/day=!{partitionKeyFromQuery:day}/
BufferingHints:
IntervalInSeconds: 300
SizeInMBs: 128
ErrorOutputPrefix: errors/
RoleARN: !GetAtt FirehoseRole.Arn
DynamicPartitioningConfiguration:
Enabled: true
ProcessingConfiguration:
Enabled: true
Processors:
- Type: MetadataExtraction
Parameters:
- ParameterName: MetadataExtractionQuery
ParameterValue: "{year:.year,month:.month,day:.day}"
- ParameterName: "JsonParsingEngine"
ParameterValue: "JQ-1.6"
DataFormatConversionConfiguration:
Enabled: true
InputFormatConfiguration:
Deserializer:
HiveJsonSerDe: {}
OutputFormatConfiguration:
Serializer:
ParquetSerDe: {}
SchemaConfiguration:
CatalogId: !Ref AWS::AccountId
RoleARN: !GetAtt FirehoseRole.Arn
DatabaseName: !Ref rawDataDB
TableName: !Ref rawDataTable
Region:
Fn::ImportValue: AWSRegion
VersionId: LATEST
FirehoseRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: firehose.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: !Sub firehose-glue-${Envname}
PolicyDocument: |
{
"Version": "2012-10-17",
"Statement":
[
{
"Effect": "Allow",
"Action":
[
"glue:*",
"iam:ListRolePolicies",
"iam:GetRole",
"iam:GetRolePolicy",
"tag:GetResources",
"s3:*",
"cloudwatch:*",
"ssm:*"
],
"Resource": "*"
}
]
}
Note:
rawDataDB is a reference to glue database
rawDataTable is a reference to table
rawdataS3bucket is a reference to s3 bucket

AWS data pipeline activity with multiple inputs

As part of an Amazon AWS data pipeline, I have a hive activity using two unstaged S3 data nodes as input. What I want is to be able to set two script variables on the activity, each pointing to an input data node, but I can't get the syntax right. With the single input, I could write the following and it would work just fine:
INPUT_FOO=#{input.directoryPath}
When I add the second input, I run into a problem of how to reference them since they are now an array of inputs, as you can see in the pipeline definition below. Essentially, I want to achieve the following, but can't figure out the correct syntax:
INPUT_FOO=#{input[1].directoryPath}
INPUT_BAR=#{input[2].directoryPath}
Here's the activity portion of the pipeline definition:
{
"id": "ActivityId_7u1sR",
"input": [
{
"ref": "DataNodeId_iYnxf"
},
{
"ref": "DataNodeId_162Ka"
}
],
"schedule": {
"ref": "DefaultSchedule"
},
"scriptUri": "#{myS3ScriptLocation}calculate-results.q",
"name": "Perform Calculations",
"runsOn": {
"ref": "EmrClusterId_jHeiV"
},
"scriptVariable": [
"INPUT_SOURCE1=#{input[1].directoryPath}",
"OUTPUT=#{output.directoryPath}Results/",
"INPUT_SOURCE2=#{input[2].directoryPath}"
],
"output": {
"ref": "DataNodeId_2jY6v"
},
"type": "HiveActivity",
"stage": "false"
}
I plan to keep the tables unstaged and take care of table creation in the hive script so that it's easier to run each Hive activity in isolation as well as in the pipeline itself.
Here's the error I see when using array syntax:
Unable to resolve input[1].directoryPath for object ActivityId_7u1sR'
As it stands now, this scenario is not supported, but a feature request was added to support it in the future.