How to pass a variable to EMR addStep in AWS StepFunctions - amazon-emr

AWS Stepfunctions recently added EMR integration, which is cool, but i couldn't find a way to pass a variable from step functions into the addstep args.
For example i would like to pass "$.dayid" variable into "Parameters">"Step">"HadoopJarStep">Args. Similar to "ClusterId.$": "$.ClusterId" (this cluster id variable works).
{
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"hive-script",
"--run-hive-script",
"--args",
"-f",
"s3://<region>.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
"-d",
"INPUT=s3://<region>.elasticmapreduce.samples",
"-d",
"OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid"
]
}
}
},
"End": true
}

Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray". With that said, you also won't be able to reference substitute a value inside a string like you are trying to do in "OUTPUT=s3:///MyHiveQueryResults/$.dayid"
So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state I would recommend you call a Lambda function to construct the string "OUTPUT=s3:///MyHiveQueryResults/$.dayid" as well as the full Array you send to Args.
{
"StartAt": "Pre-Process",
"States": {
"Pre-Process": {
"Type": "Task",
"Resource": "<Lambda function to generate the string OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid and output the Args array>",
"Next": "Step_One"
},
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.ArgsGeneratedByPreProcessingState"
}
}
},
"End": true
}
}
}

Step Functions now has intrinsic functions that can help this situation.
"PayloadString.$": "States.Format('[[{}]]', States.JsonToString($.in.summary))",
"CmdLine.$": "States.Array('--maxp', $.params.maxpr, '--minp', $.params.minpr)"
Can't believe it took this long for these functions to become available.
See documentation

Related

Deeply nested unevaluatedProperties and their expectations

I have been working on my own validator for JSON schema and FINALLY have most of how unevaluatedProperties are supposed to work,... I think. That's one tricky piece there! However I really just want to confirm one thing. Given the following schema and JSON, what is the expected outcome... I have tried it with a https://www.jsonschemavalidator.net and gotten an answer, but I was hoping I could get a more definitive answer.
The focus is the faz property is in fact being evaluated, but the command to disallow unevaluatedProperties comes from a deeply nested schema.
Thoguhts?
Here is the schema...
{
"type": "object",
"properties": {
"foo": {
"type": "object",
"properties": {
"bar": {
"type": "string"
}
},
"unevaluatedProperties": false
}
},
"anyOf": [
{
"properties": {
"foo": {
"properties": {
"faz": {
"type": "string"
}
}
}
}
}
]
}
Here is the JSON...
{
"foo": {
"bar": "test",
"faz": "test"
}
}
That schema will successfully evaluate against the provided data. The unevaluatedProperties keyword will be aware of properties evaluated in subschemas of adjacent keywords, and is evaluated after all other applicator keywords, so it will see the annotation produced from within the anyOf subschema, also.
Evaluating this keyword is easy if you follow the specification literally -- it uses annotations to decide what to do. You just need to make sure that all keywords either produce annotations correctly or propagate annotations correctly that were produced by other keywords, and then all the information is available to generate the correct result.
The result produced by my implementation is:
{
"annotations" : [
{
"annotation" : [
"faz"
],
"instanceLocation" : "/foo",
"keywordLocation" : "/anyOf/0/properties/foo/properties"
},
{
"annotation" : [
"foo"
],
"instanceLocation" : "",
"keywordLocation" : "/anyOf/0/properties"
},
{
"annotation" : [
"bar"
],
"instanceLocation" : "/foo",
"keywordLocation" : "/properties/foo/properties"
},
{
"annotation" : [],
"instanceLocation" : "/foo",
"keywordLocation" : "/properties/foo/unevaluatedProperties"
},
{
"annotation" : [
"foo"
],
"instanceLocation" : "",
"keywordLocation" : "/properties"
}
],
"valid" : true
}
This is not an answer but a follow up example which I feel is in the same vein. I feel this guides us to the answer.
Here we have a single object being validated. But the unevaluated command resides in two different schemas each a part of a different "adjacent keyword subschemas"(from the core spec http://json-schema.org/draft/2020-12/json-schema-core.html#rfc.section.11)
How should this be resolved. If all annotations must be evaluated then in what order do I evaluate? The oneOf first or the anyOf? According the spec an unevaluated command(properties or items) generate annotation results which means that that result would affect any other unevaluated command.
http://json-schema.org/draft/2020-12/json-schema-core.html#unevaluatedProperties
"The annotation result of this keyword is the set of instance property names validated by this keyword's subschema."
This is as far as I am understanding the spec.
According to the two validators I am using this fails.
Schema
{
"$schema": "https://json-schema.org/draft/2019-09/schema",
"type": "object",
"properties": {
"foo": {
"type": "string"
}
},
"oneOf": [
{
"properties": {
"faz": {
"type": "string"
}
},
"unevaluatedProperties": true
}
],
"anyOf": [
{
"properties": {
"bar": {
"type": "string"
}
},
"unevaluatedProperties": false
}
]
}
Data
{
"bar": "test",
"faz": "test",
}

Is it possible to call lambda function from other lambda functions in AWS serverless API?

I am creating a serverless API using AWS SAM template and ASP.Net Core.
I wanted to know if it is possible to call a common lamnda function from multiple lambda functions?
I have 2 APIs for user authentication.
/user/authenticate
/admin/authenticate
Now when the user calls these API endpoints I want to call a common lambda function which will look like following:
public AuthResponse Authenticate(AuthInfo info, int role);
I get a user role based on which API endpoint is called. For example if /user/authetication is called then role=1 otherwise role=0.
And then I want Authenticate() lambda to perform user authentication based on the AuthInfo + Role.
I want to do this because all my users are stored in the same table and I would like to cross verify if user has the correct role to access the feature.
I will also share a portion of serverless.template used for above APIs.
/admin/authenticate
"Handler": "Server::API.Admin::Authenticate",
"Description" : "Allows admin to authenticate",
"Runtime": "dotnetcore2.1",
"CodeUri": "",
"MemorySize": 256,
"Timeout" : 300,
"Role": {"Fn::GetAtt" : [ "LambdaExecutionRole", "Arn"]},
"FunctionName" : "AdminAuthenticate",
"Events":
{
"PutResource":
{
"Type": "Api",
"Properties":
{
"Path": "/v1/admin/authenticate",
"Method": "POST"
}
}
}
}
}
/user/authenticate
"Handler": "Server::API.User::Authenticate",
"Description" : "Allows user to authenticate",
"Runtime": "dotnetcore2.1",
"CodeUri": "",
"MemorySize": 256,
"Timeout" : 300,
"Role": {"Fn::GetAtt" : [ "LambdaExecutionRole", "Arn"]},
"FunctionName" : "UserAuthenticate",
"Events":
{
"PutResource":
{
"Type": "Api",
"Properties":
{
"Path": "/v1/user/authenticate",
"Method": "GET"
}
}
}
}
}
As you can see above, 2 lambda functions are created AdminAuthenticate and UserAuthentication. I want these lambda functions to share the common code.
Does anyone has any idea how to do it?
Thanks and Regards.
I can think about 2 options to achieve your goal. In the first option, you use multiple Lambda functions, one for each endpoint, both pointing to your same codebase. In the second option, you have a single Lambda function that handles all authentication needs.
Single codebase, multiple functions
In this case, you can define your template file with 2 functions but use the CodeUri property to point to the same codebase.
{
"AWSTemplateFormatVersion": "2010-09-09",
"Transform": "AWS::Serverless-2016-10-31",
"Resources": {
"AdminFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "Server::API.Admin::Authenticate",
"Description": "Allows admin to authenticate",
"Runtime": "dotnetcore2.1",
"CodeUri": "./codebase_path/",
"MemorySize": 256,
"Timeout": 300,
"FunctionName": "AdminAuthenticate",
"Events": {
"PutResource": {
"Type": "Api",
"Properties": {
"Path": "/v1/admin/authenticate",
"Method": "POST"
}
}
}
}
},
"UserFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "Server::API.User::Authenticate",
"Description": "Allows user to authenticate",
"Runtime": "dotnetcore2.1",
"CodeUri": "./codebase_path/",
"MemorySize": 256,
"Timeout": 300,
"FunctionName": "UserAuthenticate",
"Events": {
"PutResource": {
"Type": "Api",
"Properties": {
"Path": "/v1/user/authenticate",
"Method": "POST"
}
}
}
}
}
}
}
Single codebase, single function
In this case, you will expose 2 endpoints on API Gateway, but they will be directed to the same handler on your function. Therefore, you will need to write some logic on your code to handle the login properly. The event object passed to your Lambda function will have information on the original URL in the path property (reference -- even though this is for Lambda proxy, still applies).
The template file in this case would be similar to (note I replaced Admin/User terms to "Any", since this will handle any form of authentication):
{
"AWSTemplateFormatVersion": "2010-09-09",
"Transform": "AWS::Serverless-2016-10-31",
"Resources": {
"AnyFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "Server::API.Any::Authenticate",
"Description": "Allows any to authenticate",
"Runtime": "dotnetcore2.1",
"CodeUri": "./hello_world/",
"MemorySize": 256,
"Timeout": 300,
"Events": {
"UserEndpoint": {
"Type": "Api",
"Properties": {
"Path": "/v1/user/authenticate",
"Method": "POST"
}
},
"AdminEndpoint": {
"Type": "Api",
"Properties": {
"Path": "/v1/admin/authenticate",
"Method": "POST"
}
}
}
}
}
}
}
You can invoke lambda functions through any lambda function, using aws sdk in your chosen language and there you have invoke function defined.
For a reference here is the link to boto3 invoke definition.
Also
The approach you are using for authentication using common codebase is not the right one.
If you need a lambda function to check or authentication particular request, you can setup custom authorizer for that, which in your terms i can say, share lambda code or call it first before invoking the lambda you setup for the particular endpoint with the possibility to pass the custom data if you want.
A Lambda authorizer (or as a custom authorizer) is an API Gateway feature that uses a Lambda function to control access to your API.
If still this doesn't solve your problem and you want common codebase, you can point out as many as api endpoint's to same lambda function.
Then you have to handle event['resources'] inside your codebase.
This is not the recommended way, but you can use it.
You can refer aws samples to setup custom authorizer or the documentation is fair enough to clear all your doubts.

How to pass AWS Lambda error in AWS SNS notification through AWS Step Functions?

I have created an AWS Step Function which triggers a Lambda python code, terminates without error if Lambda succeeds, otherwise calls an SNS topic to message the subscribed users if the Lambda fails. It is running, but the message was fixed. The Step Function JSON is as follows:
{
"StartAt": "Lambda Trigger",
"States": {
"Lambda Trigger": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-2:xxxxxxxxxxxx:function:helloworldTest",
"End": true,
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.error",
"Next": "Notify Failure"
}
]
},
"Notify Failure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"Message": "Batch job submitted through Step Functions failed with the following error, $.error",
"TopicArn": "arn:aws:sns:us-east-2:xxxxxxxxxxxx:lambda-execution-failure"
},
"End": true
}
}
}
Only thing is, I want to append the failure error message to my message string, which I tried, but is not working as expected.
But I get a mail as follows:
How to go about it?
I could solve the problem using "Error.$": "$.Cause".
The following is a working example of the failure portion of state machine:
"Job Failure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"Subject": "Lambda Job Failed",
"Message": {
"Alarm": "Lambda Job Failed",
"Error.$": "$.Cause"
},
"TopicArn": "arn:aws:sns:us-east-2:xxxxxxxxxxxx:Job-Run-Notification"
},
"End": true
}
Hope this helps!
Here is the full version of the code
{
"Comment": "A Hello World example of the Amazon States Language using an AWS Lambda function",
"StartAt": "HelloWorld",
"States": {
"HelloWorld": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXXXXX:function:StepFunctionTest",
"End": true,
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "NotifyFailure"
}
]
},
"NotifyFailure": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"Subject": "[ERROR]: Task failed",
"Message": {
"Alarm": "Batch job submitted through Step Functions failed with the following error",
"Error.$": "$.Cause"
},
"TopicArn": "arn:aws:sns:us-east-1:XXXXXXXXXXXXX:Notificaiton"
},
"End": true
}
}
}
This line is already appending exception object to 'error' path.
"ResultPath": "$.error"
We just need pass '$' to Message.$ to SNS task, both input and error details will be sent to SNS.
{
"TopicArn":"${SnsTopic}",
"Message.$":"$"
}
if we don't want input to Lambda to be appended in email, we should skip ResultPath or have just '$' as ResultPath, input object is ignored.
"ResultPath": "$"

Is there a way to get Step Functions input values into EMR step Args

We are running batch spark jobs using AWS EMR clusters. Those jobs run periodically and we would like to orchestrate those via AWS Step Functions.
As of November 2019 Step Functions has support for EMR natively. When adding a Step to the cluster we can use the following config:
"Some Step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--class",
"com.some.package.Class",
"JarUri",
"--startDate",
"$.time",
"--daysToLookBack",
"$.daysToLookBack"
]
}
}
},
"Retry" : [
{
"ErrorEquals": [ "States.ALL" ],
"IntervalSeconds": 1,
"MaxAttempts": 1,
"BackoffRate": 2.0
}
],
"ResultPath": "$.firstStep",
"End": true
}
Within the Args List of the HadoopJarStep we would like to set arguments dynamically. e.g. if the input of the state machine execution is:
{
"time": "2020-01-08",
"daysToLookBack": 2
}
The strings in the config starting with "$." should be replaced accordingly when executing the State Machine, and the step on the EMR cluster should run command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate 2020-01-08 --daysToLookBack 2. But instead it runs command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate $.time --daysToLookBack $.daysToLookBack.
Does anyone know if there is a way to do this?
Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray".
So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state you can either call a Lambda function and format your input/output through code or for something as simple as adding a dynamic value to an array you can use a Pass State to reformat the data and then inside your task State Parameters you can use JSONPath to get the array which you defined in in the pre-processor. Here's an example:
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "HardCodedInputs",
"States": {
"HardCodedInputs": {
"Type": "Pass",
"Parameters": {
"cluster": {
"ClusterId": "ValueForClusterIdVariable"
},
"time": "ValueForTimeVariable",
"daysToLookBack": "ValueFordaysToLookBackVariable"
},
"Next": "Pre-Process"
},
"Pre-Process": {
"Type": "Pass",
"Parameters": {
"FormattedInputsForEmr": {
"ClusterId.$": "$.cluster.ClusterId",
"Args": [
{
"Arg1": "spark-submit"
},
{
"Arg2": "--class"
},
{
"Arg3": "com.some.package.Class"
},
{
"Arg4": "JarUri"
},
{
"Arg5": "--startDate"
},
{
"Arg6.$": "$.time"
},
{
"Arg7": "--daysToLookBack"
},
{
"Arg8.$": "$.daysToLookBack"
}
]
}
},
"Next": "Some Step"
},
"Some Step": {
"Type": "Pass",
"Parameters": {
"ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.FormattedInputsForEmr.Args[*][*]"
}
}
},
"End": true
}
}
}
You can use the States.Array() intrinsic function. Your Parameters becomes:
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--class', 'com.some.package.Class', 'JarUri', '--startDate', $.time, '--daysToLookBack', '$.daysToLookBack')"
}
}
}
Intrinsic functions are documented here but I don't think it explains the usage very well. The code snippets provided in the Step Functions console are more useful.
Note that you can also do string formatting on the args using States.Format(). For example, you could construct a path using an input variable as the final path segment:
"Args.$": "States.Array('mycommand', '--path', States.Format('my/base/path/{}', $.someInputVariable))"

ADF V2 failure when using bool variable

Very simple issue. I am trying to set up a pipeline that has a variable of type bool. As soon as I add it, the pipeline fails with:
{
"code":"BadRequest",
"message":"Invalid value for property 'type'",
"target":"pipeline/pipeline2/runid/66b9c7be-9894-494a-abd9-34fd92bbd972",
"details":null,
"error":null
}
simple pipeline with a string variable and a wait activity succeeds.
{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "Wait1",
"type": "Wait",
"typeProperties": {
"waitTimeInSeconds": 1
}
}
],
"variables": {
"Test": {
"type": "String",
"defaultValue": "\"Hello\""
}
}
}
}
When I add a bool and nothing else, it fails to debug.
{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "Wait1",
"type": "Wait",
"typeProperties": {
"waitTimeInSeconds": 1
}
}
],
"variables": {
"Test": {
"type": "String",
"defaultValue": "\"Hello\""
},
"TestBool": {
"type": "Bool",
"defaultValue": false
}
}
}
}
Any clue how to get this to work? I am trying to use this variable as a condition for an Until loop.
Many thanks.
ok, I experimented.
If I go into the code and set the type as boolean rather than Bool, then the above pipeline runs.
Looks like a UI bug in the designer that sets the type to Bool. I'll file a bug report.
Mark.
Update ok it runs but I can't set a default value (it disappears) and anything that references the value causes an Internal Server Error (presumably because it is null which is invalid for a Boolean). Definitely something for the engineers to look at.
Update 2 It appears you can set the variable with SetVariable without error but it appears not to do anything. The value is always true in my test case.
Update 3 Microsoft has a fix coming next week.