Amazon Step Function with a Lambda that takes trigger from Kinesis - amazon-s3

So I am trying to create a simple pipeline in Amazon AWS. I want to execute a step function using data generated by a stream which triggers the first lambda of the state machine
What I want to do is following.
Input data is streamed by AWS Kinesis
This Kinesis stream is used as a trigger for a lambda1 that executes and writes to S3 Bucket.
This would trigger (using step function) a lambda2 that would read the content from the given bucket and write it to another bucket
Now I want to implement a state machine using Amazon Step Function. I have created the state machine which is quite straightforward
{
"Comment": "Linear step function test",
"StartAt": "lambda1",
"States": {
"lambda1": {
"Type": "Task",
"Resource": "arn:....",
"Next": "lambda2"
},
"lambda2": {
"Type": "Task",
"Resource": "arn:...",
"End": true
}
}
}
What I want is, that Kinesis should trigger the first Lambda and once its executed the step function would execute lambda 2. Which does not seem to happen. Step function does nothing even though my Lambda 1 is triggered from the stream and writing to S3 bucket. I have an option to manually start a new execution and pass a JSON as input, but that is not the work flow I am looking for

you did wrong to kick off State machine.
you need to add another Starter Lambda function to use SDK to invoke State Machine. The process is like this:
kinesis -> starter(lambda) -> StateMachine (start Lambda 1 and Lambda 2)
The problem of using Step Function is lack of triggers. There are only 3 triggers which are CloudWatch Events, SDK or API Gateway.

Related

Error exporting state transition results to S3

When enabling distributed map task and exporting the state transitions history to S3 triggers a exception with the following error message
An error occurred while executing the state 'Map' (entered at the event id #12). Failed to write a test manifest into the specified output bucket. | Message from S3: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint
This is my the ResultWriter key object definition
/// other keys omitted...
"ResultWriter": {
"Resource": "arn:aws:states:::s3:putObject",
"Parameters": {
"Bucket": "sds-qa-nv",
"Prefix": "distributed_excecutions/"
}
}
I tried enabling Export Map state results to Amazon S3 to save state transition to S3 and I'm expecting that the results are saved to S3 without failing.

How is data in kinesis decrypted before hitting s3

I currently have an architecture where my kinesis -> kinesis firehouse -> s3
I am creating records directly in kinesis using:
aws kinesis put-record --stream-name <some_kinesis_stream> --partition-key 123 --data testdata --profile sandbox
The data when I run:
aws kinesis get-records --shard-iterator --profile sandbox
looks like this:
{
"SequenceNumber": "49597697038430366340153578495294928515816248592826368002",
"ApproximateArrivalTimestamp": 1563835989.441,
"Data": "eyJrZXkiOnsiZW1wX25vIjo1Mjc2OCwiZGVwdF9ubyI6ImQwMDUifSwidmFsdWUiOnsiYmVmb3JlIjpudWxsLCJhZnRlciI6eyJlbXBfbm8iOjUyNzY4LCJkZXB0X25vIjoiZDAwNSIsImZyb21fZGF0ZSI6Nzk2NSwidG9fZGF0ZSI6MjkzMjUzMX0sInNvdXJjZSI6eyJ2ZXJzaW9uIjoiMC45LjUuRmluYWwiLCJjb25uZWN0b3IiOiJteXNxbCIsIm5hbWUiOiJraW5lc2lzIiwic2VydmVyX2lkIjowLCJ0c19zZWMiOjAsImd0aWQiOm51bGwsImZpbGUiOiJteXNxbC1iaW4tY2hhbmdlbG9nLjAwMDAwMiIsInBvcyI6MTU0LCJyb3ciOjAsInNuYXBzaG90Ijp0cnVlLCJ0aHJlYWQiOm51bGwsImRiIjoiZW1wbG95ZWVzIiwidGFibGUiOiJkZXB0X2VtcCIsInF1ZXJ5IjpudWxsfSwib3AiOiJjIiwidHNfbXMiOjE1NjM4MzEzMTI2Njh9fQ==",
"PartitionKey": "-591791328"
}
but in s3, it looks like:
`testdatatestdatatestdatatestdatatestdatatestdatatestdatatestdata`
because I ran the putrecords several times.
So what is going on? When I run get-records, what records am I obtaining? What is that data? How is that data then decrypted into my original string? What is going on?
15 days old now, so hopefully you found the answer already.
If not, it seems the reason you have a mismatch in data between get-records and what you see in S3 is based on how you performed the aws kinesis get-records --shard-iterator --profile sandbox call, you didn't explicitly provide a shard iterator value.
What you saw in S3 is correct and expected based on your --data testdata put-record calls.
testdatatestdatatestdatatestdatatestdatatestdatatestdatatestdata
What you saw in Kinesis is base64 encoded:
"Data": "eyJrZXkiOnsiZW1wX25vIjo1Mjc2OCwiZGVwdF9ubyI6ImQwMDUifSwidmFsdWUiOnsiYmVmb3JlIjpudWxsLCJhZnRlciI6eyJlbXBfbm8iOjUyNzY4LCJkZXB0X25vIjoiZDAwNSIsImZyb21fZGF0ZSI6Nzk2NSwidG9fZGF0ZSI6MjkzMjUzMX0sInNvdXJjZSI6eyJ2ZXJzaW9uIjoiMC45LjUuRmluYWwiLCJjb25uZWN0b3IiOiJteXNxbCIsIm5hbWUiOiJraW5lc2lzIiwic2VydmVyX2lkIjowLCJ0c19zZWMiOjAsImd0aWQiOm51bGwsImZpbGUiOiJteXNxbC1iaW4tY2hhbmdlbG9nLjAwMDAwMiIsInBvcyI6MTU0LCJyb3ciOjAsInNuYXBzaG90Ijp0cnVlLCJ0aHJlYWQiOm51bGwsImRiIjoiZW1wbG95ZWVzIiwidGFibGUiOiJkZXB0X2VtcCIsInF1ZXJ5IjpudWxsfSwib3AiOiJjIiwidHNfbXMiOjE1NjM4MzEzMTI2Njh9fQ==",
So decoding gets you:
{
"key":
{
"emp_no": 52768,
"dept_no": "d005"
},
"value":
{
"before": null,
"after":
{
"emp_no": 52768,
"dept_no": "d005",
"from_date": 7965,
"to_date": 2932531
},
"source":
{
"version": "0.9.5.Final",
"connector": "mysql",
"name": "kinesis",
"server_id": 0,
"ts_sec": 0,
"gtid": null,
"file": "mysql-bin-changelog.000002",
"pos": 154,
"row": 0,
"snapshot": true,
"thread": null,
"db": "employees",
"table": "dept_emp",
"query": null
},
"op": "c",
"ts_ms": 1563831312668
}
}
The reason why it didn't match your "testdata" is because you were looking into the wrong shard iterator on possibly the wrong shard. Unsure what your kinesis setup is exactly.
Give this article a once over, https://docs.aws.amazon.com/streams/latest/dev/fundamental-stream.html . Should give you the steps to test this workflow.
It seems that you've configured your firehose to enable server-side data encryption. If this is the case then the following applies:
When you configure a Kinesis data stream as the data source of a Kinesis Data Firehose delivery stream, Kinesis Data Firehose no longer stores the data at rest. Instead, the data is stored in the data stream.
When you send data from your data producers to your data stream, Kinesis Data Streams encrypts your data using an AWS Key Management Service (AWS KMS) key before storing the data at rest. When your Kinesis Data Firehose delivery stream reads the data from your data stream, Kinesis Data Streams first decrypts the data and then sends it to Kinesis Data Firehose. Kinesis Data Firehose buffers the data in memory based on the buffering hints that you specify. It then delivers it to your destinations without storing the unencrypted data at rest.
Find out more at: https://docs.aws.amazon.com/firehose/latest/dev/encryption.html

how to achieve idempotent behavior in lambda that writes to S3?

I have a AWS lambda that does some calculation and writes output to S3 location. The AWS lambda is triggered by cloudwatch cron expression. Since the lambda can be triggered multiple times, I want to modify lamda code such that it handles multiple triggers for the lambda.
The only major side-effect for my lambda is writing to S3 and sending a mail. In this case, how do I ensure the lambda executes multiple times but still ensuring idempotent behavior?
You need a unique id and check on it before processing.
See if the event has one. For example, Sample event for "cron expression" in cloud watch rule suggests that you will get something like this:
{
"version": "0",
"id": "89d1a02d-5ec7-412e-82f5-13505f849b41",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "123456789012",
"time": "2016-12-30T18:44:49Z",
"region": "us-east-1",
"resources": [
"arn:aws:events:us-east-1:123456789012:rule/SampleRule"
],
"detail": {}
}
In this case your code would read event.id and write that to S3 (say yourbucket/running/89d1a02d-5ec7-412e-82f5-13505f849b41). When lambda is initiated it can list the keys under "yourbucket/running" and see if event.id matches any of them.
If none matches, create it.
It's not bullet-proof solution, you might conceivably run into some race-condition, say if another lambda fires up while AWS is slow in creating the key but fast at launching the other lambda but that is what you have to live with if you would like to use s3.

Exporting Cloudwatch logs in original format

I am looking to find a way to export CW logs in their original form to s3. I used the console to export a days worth of logs from a log group, and it seems that a timestamp was prepended on each line, breaking the original JSON formatting. I was looking to import this into glue as a json file for a test transformation script. The original data used is formated as a normal json string when imported to cloudwatch and normally process the data it looks like:
{ "a": 123, "b": "456", "c": 789 }
After exporting and decompressing the data it looks like
2019-06-28T00:00:00.099Z { "a": 123, "b": "456", "c": 789 }
Which breaks reading the line as a json string since its no long a standard format.
The dataset is fairly large(100GB+) for this run, and will possibly grow larger in the future, so running the command a CLI command and processing each line locally isn't feasible in my opinion. Is there any known way to do what I am looking to do?
Thank you
TimeStamps are automatically added when you push the logs to the CloudWatch.
All the log events present in the CloudWatch has timestamp.
You can create a subscription filter to Kinesis Firehose and on Kinesis using lambda function you can formate the log events(remove the timestamp) then store the logs in the S3.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html

How can I delete an existing S3 event notification?

When I try to delete an event notification from S3, I get the following message:
In Text:
Unable to validate the following destination configurations. Not authorized to invoke function [arn:aws:lambda:eu-west-1:FOOBAR:function:FOOBAR]. (arn:aws:lambda:eu-west-1:FOOBAR:function:FOOBAR, null)
Nobody in my organization seems to be able to delete that - not even admins.
When I try to set the same S3 event notification in AWS Lambda as a trigger via the web interface, I get
Configuration is ambiguously defined. Cannot have overlapping suffixes in two rules if the prefixes are overlapping for the same event type. (Service: Amazon S3; Status Code: 400; Error Code: InvalidArgument; Request ID: FOOBAR; S3 Extended Request ID: FOOBAR/FOOBAR/FOOBAR)
How can I delete that existing event notification? How can I further investigate the problem?
I was having the same problem tonight and did the following:
1) Issue the command:
aws s3api put-bucket-notification-configuration --bucket=mybucket --notification-configuration="{}"
2) In the console, delete the troublesome event.
Assuming you have better permissions from the CLI:
aws s3api put-bucket-notification-configuration --bucket=mybucket --notification-configuration='{"LambdaFunctionConfigurations": []}'
retrieve all the notification configurations of a specific bucket
aws s3api get-bucket-notification-configuration --bucket=mybucket > notification.sh
the notification.sh file would look like the following
{
"LambdaFunctionConfigurations": [
{
"Id": ...,
"LambdaFunctionArn": ...,
"Events": [...],
"Filter": {...},
},
{ ... },
]
}
remove the notification object from the notification.sh
modify the notification.sh like the following
#! /bin/zsh
aws s3api put-bucket-notification-configuration --bucket=mybucket --notification-configuration='{
"LambdaFunctionConfigurations": [
{
"Id": ...,
"LambdaFunctionArn": ...,
"Events": [...],
"Filter": {...},
},
{ ... },
]
}'
run the shell script
source notification.sh
There is no 's3api delete notification-configuration' in AWS CLI. Only the 's3api put-bucket-notification-configuration' is present which will override any previously existing events in the s3 bucket. So, if you wish to delete a specific event only you need to handle that programatically.
Something like this:
Step 1. Do a 's3api get-bucket-notification-configuration' and get the s3-notification.json file.
Step 2. Now edit this file to reach the required s3-notification.json file using your code.
Step 3. Finally, do 's3api put-bucket-notification-configuration' (aws s3api put-bucket-notification-configuration --bucket my-bucket --notification-configuration file://s3-notification.json)
i had worked on the logic in AWS CLI, it requires a jq command to merge the json output
I tried but doesnt work for me, I uploaded a lambda with the same name of function but without events, after go to the function in the dashboard and add trigger with the same prefix and suffix, when apply changes the dashboard says error, but if you come back to function lambda, you can see the trigger now is linked to lambda, so after you can remove tha lambda or events