Pass environment variables in Step functions - amazon-s3

I would like to pass environment variable(like passing prod or dev) in step functions so that pyspark program can get it by using os.getenv().There are many programs using same environment variables so passing them as arguments to the programs will take some time.Tried below code
[
{
"Classification": "yarn-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"VARIABLE_NAME": VARIABLE_VALUE,
}
}
]
}
]
but it didn't work.Seems this code will work for setting yarn configurations.

Related

Is there a way to extract rendered preset information from CMake?

I have a CMakePresets.json file which makes use of inheritance and macro expansion.
Here is an excerpt, in reality I use multiple versions of "Foo":
{
"configurePresets": [
{
"name": "default",
"hidden": true,
"generator": "Unix Makefiles",
"binaryDir": "cmake-build-${presetName}",
"environment": {
"PATH": "/opt/foo/$env{FOO_VERSION}/bin:$penv{PATH}",
"LD_LIBRARY_PATH": "/opt/foo/$env{FOO_VERSION}.0/lib"
},
"cacheVariables": {
"FOO_VERSION": "$env{FOO_VERSION}",
}
},
{
"name": "debug-foo1",
"inherits": "default",
"environment": { "FOO_VERSION": "1" }
},
{
"name": "release-foo1",
"inherits": "debug-foo1",
"cacheVariables": { "CMAKE_BUILD_TYPE": "Release" }
}
],
"buildPresets": [
{
"name": "debug-foo1",
"configurePreset": "debug-foo1"
},
{
"name": "release-foo1",
"configurePreset": "release-foo1"
}
]
}
Now assume I select the preset release-foo1. This would render the following variables, among others:
binaryDir = "cmake-build-release-foo1"
FOO_VERSION = "1"
LD_LIBRARY_PATH = "/opt/foo/1.0/lib"
Is there a way to query these results for a given preset? For example, given release-foo1, I want to know the resulting binaryDir.
Of course I could parse the JSON myself, but that seems tedious, especially because of the cross references and substitutions which are being made by CMake.
You can use the -N flag to print the computed presets info without running the configure or generate steps. After modifying the presets file in your question to work, I created an empty CMakeLists.txt next to it and ran this test (*** stands in for my PATH):
$ cmake --preset=release-foo1 -N
Preset CMake variables:
CMAKE_BUILD_TYPE="Release"
FOO_VERSION="1"
Preset environment variables:
FOO_VERSION="1"
LD_LIBRARY_PATH="/opt/foo/1.0/lib"
PATH="/opt/foo/1/bin:***"
This is good enough for debugging your presets. If you need to go further, you could process the output using standard unix command line tools, like awk, grep, cut, etc.
I think this is the best you can do for now (CMake <=3.21) since there's nothing else in the command line documentation or the file API for IDEs that I could find.
Building on Alex Reinkings answer, you can get the binaryDir (and other parts) with the -N option if you tweek the preset a little bit.
You need to add a environment variable and reference it for binaryDir:.
See this tweeked example from CMake documentation:
{
"version": 2,
"cmakeMinimumRequired": {
"major": 3,
"minor": 20,
"patch": 0
},
"configurePresets": [
{
"name": "default",
"displayName": "Default Config",
"description": "Default build using Ninja generator",
"generator": "Ninja",
"binaryDir": "$env{BUILD_DIR}",
"cacheVariables": {
"FIRST_CACHE_VARIABLE": {
"type": "BOOL",
"value": "OFF"
},
"SECOND_CACHE_VARIABLE": "ON"
},
"environment": {
"BUILD_DIR": "${sourceDir}/build/default",
"MY_ENVIRONMENT_VARIABLE": "Test",
"PATH": "$env{HOME}/ninja/bin:$penv{PATH}"
},
"vendor": {
"example.com/ExampleIDE/1.0": {
"autoFormat": true
}
}
},
{
"name": "ninja-multi",
"inherits": "default",
"displayName": "Ninja Multi-Config",
"description": "Default build using Ninja Multi-Config generator",
"generator": "Ninja Multi-Config"
}
],
"buildPresets": [
{
"name": "default",
"configurePreset": "default"
}
],
"testPresets": [
{
"name": "default",
"configurePreset": "default",
"output": {"outputOnFailure": true},
"execution": {"noTestsAction": "error", "stopOnFailure": true}
}
],
"vendor": {
"example.com/ExampleIDE/1.0": {
"autoFormat": false
}
}
}
This will give you:
$ cmake --preset default -N
Preset CMake variables:
FIRST_CACHE_VARIABLE:BOOL="OFF"
SECOND_CACHE_VARIABLE="ON"
Preset environment variables:
=>BUILD_DIR="D:/tmp/presets/build/default"<=
MY_ENVIRONMENT_VARIABLE="Test"
PATH="..."

dotnet-monitor and OpenTelemetry?

I'm learning OpenTelemetry and I wonder how dotnet-monitor is connected with OpenTelemetry (Meter). Are those things somehow connected or maybe dotnet-monitor is just custom MS tools that is not using standards from OpenTelemetry (API, SDK and exporters).
If you run dotnet-monitor on your machine it exposes the dotnet metrics in Prometheus format which mean you can set OpenTelemetry collector to scrape those metrics
For example in OpenTelemetry-collector-contrib configuration
receivers:
prometheus_exec:
exec: dotnet monitor collect
port: 52325
Please note that for dotnet-monitor to run you need to create a setting.json in theis path:
$XDG_CONFIG_HOME/dotnet-monitor/settings.json
If $XDG_CONFIG_HOME is not defined, create the file in this path:
$HOME/.config/dotnet-monitor/settings.json
If you want to identify the process by its PID, write this into settings.json (change Value to your PID):
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessId",
"Value": "1"
}]
}
}
If you want to identify the process by its name, write this into settings.json (change Value to your process name):
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessName",
"Value": "iisexpress"
}]
},
}
In my example I used this configuration:
{
"DefaultProcess": {
"Filters": [{
"Key": "ProcessId",
"Value": "1"
}]
},
"Metrics": {
"Providers": [
{
"ProviderName": "System.Net.Http"
},
{
"ProviderName": "Microsoft-AspNetCore-Server-Kestrel"
}
]
}
}

Is there a way to get Step Functions input values into EMR step Args

We are running batch spark jobs using AWS EMR clusters. Those jobs run periodically and we would like to orchestrate those via AWS Step Functions.
As of November 2019 Step Functions has support for EMR natively. When adding a Step to the cluster we can use the following config:
"Some Step": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--class",
"com.some.package.Class",
"JarUri",
"--startDate",
"$.time",
"--daysToLookBack",
"$.daysToLookBack"
]
}
}
},
"Retry" : [
{
"ErrorEquals": [ "States.ALL" ],
"IntervalSeconds": 1,
"MaxAttempts": 1,
"BackoffRate": 2.0
}
],
"ResultPath": "$.firstStep",
"End": true
}
Within the Args List of the HadoopJarStep we would like to set arguments dynamically. e.g. if the input of the state machine execution is:
{
"time": "2020-01-08",
"daysToLookBack": 2
}
The strings in the config starting with "$." should be replaced accordingly when executing the State Machine, and the step on the EMR cluster should run command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate 2020-01-08 --daysToLookBack 2. But instead it runs command-runner.jar spark-submit --class com.some.package.Class JarUri --startDate $.time --daysToLookBack $.daysToLookBack.
Does anyone know if there is a way to do this?
Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray".
So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state you can either call a Lambda function and format your input/output through code or for something as simple as adding a dynamic value to an array you can use a Pass State to reformat the data and then inside your task State Parameters you can use JSONPath to get the array which you defined in in the pre-processor. Here's an example:
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "HardCodedInputs",
"States": {
"HardCodedInputs": {
"Type": "Pass",
"Parameters": {
"cluster": {
"ClusterId": "ValueForClusterIdVariable"
},
"time": "ValueForTimeVariable",
"daysToLookBack": "ValueFordaysToLookBackVariable"
},
"Next": "Pre-Process"
},
"Pre-Process": {
"Type": "Pass",
"Parameters": {
"FormattedInputsForEmr": {
"ClusterId.$": "$.cluster.ClusterId",
"Args": [
{
"Arg1": "spark-submit"
},
{
"Arg2": "--class"
},
{
"Arg3": "com.some.package.Class"
},
{
"Arg4": "JarUri"
},
{
"Arg5": "--startDate"
},
{
"Arg6.$": "$.time"
},
{
"Arg7": "--daysToLookBack"
},
{
"Arg8.$": "$.daysToLookBack"
}
]
}
},
"Next": "Some Step"
},
"Some Step": {
"Type": "Pass",
"Parameters": {
"ClusterId.$": "$.FormattedInputsForEmr.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.FormattedInputsForEmr.Args[*][*]"
}
}
},
"End": true
}
}
}
You can use the States.Array() intrinsic function. Your Parameters becomes:
"Parameters": {
"ClusterId.$": "$.cluster.ClusterId",
"Step": {
"Name": "FirstStep",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "States.Array('spark-submit', '--class', 'com.some.package.Class', 'JarUri', '--startDate', $.time, '--daysToLookBack', '$.daysToLookBack')"
}
}
}
Intrinsic functions are documented here but I don't think it explains the usage very well. The code snippets provided in the Step Functions console are more useful.
Note that you can also do string formatting on the args using States.Format(). For example, you could construct a path using an input variable as the final path segment:
"Args.$": "States.Array('mycommand', '--path', States.Format('my/base/path/{}', $.someInputVariable))"

vs code C++ include list of paths

I am using C++ in vs code and would like to automatically pull in changing external include paths that for clang to find. Currently I have to edit the c_cpp_properties.json file as so:
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"/some/path/headers1/**",
"/some/path/headers2/**",
"/some/path/headers3/**",
...
"/some/path/headersN/**"
],
...
}
],
}
Can I instead do something like this?
{
"configurations": [
{
"name": "Linux",
"includePath": [
"${workspaceFolder}/**",
"${workspaceFolder}/headers.txt"
],
...
}
],
}
Where ${workspaceFolder}/headers.txt is:
/some/path/headers1/
/some/path/headers2/
/some/path/headers3/
OR
"/some/path/headers1/**",
"/some/path/headers2/**",
"/some/path/headers3/**"
to be compliant with the original json includePath format. So there a way for me to reference the ${workspaceFolder}/headers.txt in the c_cpp_properties.json so that I don't have to manually update the paths?

How to pass a variable to EMR addStep in AWS StepFunctions

AWS Stepfunctions recently added EMR integration, which is cool, but i couldn't find a way to pass a variable from step functions into the addstep args.
For example i would like to pass "$.dayid" variable into "Parameters">"Step">"HadoopJarStep">Args. Similar to "ClusterId.$": "$.ClusterId" (this cluster id variable works).
{
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"hive-script",
"--run-hive-script",
"--args",
"-f",
"s3://<region>.elasticmapreduce.samples/cloudfront/code/Hive_CloudFront.q",
"-d",
"INPUT=s3://<region>.elasticmapreduce.samples",
"-d",
"OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid"
]
}
}
},
"End": true
}
Parameters allow you to define key-value pairs, so as the value for the "Args" key is an array, you won't be able to dynamically reference a specific element in the array, you would need to reference the whole array instead. For example "Args.$": "$.Input.ArgsArray". With that said, you also won't be able to reference substitute a value inside a string like you are trying to do in "OUTPUT=s3:///MyHiveQueryResults/$.dayid"
So for your use-case the best way to achieve this would be to add a pre-processing state, before calling this state. In the pre-processing state I would recommend you call a Lambda function to construct the string "OUTPUT=s3:///MyHiveQueryResults/$.dayid" as well as the full Array you send to Args.
{
"StartAt": "Pre-Process",
"States": {
"Pre-Process": {
"Type": "Task",
"Resource": "<Lambda function to generate the string OUTPUT=s3://<mybucket>/MyHiveQueryResults/$.dayid and output the Args array>",
"Next": "Step_One"
},
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args.$": "$.ArgsGeneratedByPreProcessingState"
}
}
},
"End": true
}
}
}
Step Functions now has intrinsic functions that can help this situation.
"PayloadString.$": "States.Format('[[{}]]', States.JsonToString($.in.summary))",
"CmdLine.$": "States.Array('--maxp', $.params.maxpr, '--minp', $.params.minpr)"
Can't believe it took this long for these functions to become available.
See documentation