AWS Data pipeline postStepCommand unable to access INPUT1_STAGING_DIR - amazon-emr

In EMR Activity of a Data pipline, I am trying to use postStepCommand (as documented here ) to invoke a shell script. As part of it I am trying to access the standard directory paths ${INPUT1_STAGING_DIR} and ${OUTPUT1_STAGING_DIR}
But seems like it's not able to access it's value. Is it by design ?

Related

Query blob storage with Get-AzDataLakeGen2ChildItem?

Our powershell test harness used to use Get-AzDataLakeGen2ChildItem to list blobs found in non data lake storage accounts. Today I updated the powershell and Az module versions they were locked at, and now when issuing the command (specifying a Filesystem container, and context), the following error is returned:
Get-AzDataLakeGen2ChildItem: Input string was not in a correct format.
I'm assuming something has changed, and this function cannot process a result from non data lake storage compatibly anymore.
For one reason or another, a while back we changed from using Get-AzStorageBlob. So interested to know if there's any solution to be able to continue working with this call, rather than to deviate from Get-AzDataLakeGen2ChildItem where required.
One of the workaround to list the sub directories and files in a directory or Filesystem from an Azure storage account using the Get-AzDataLakeGen2ChildItem .
To do that we must have enabled Hierarchical Namespace .
Then you will get something like below example;
NOTE:- If you are using existing storage which has not enabled Hierarchical Namespace then you need to upgrade that storage account by doing the below steps:
For more information please refer the below links:-
MS DOC| Get-AzDataLakeGen2ChildItem , Get-AzStorageBlob .
SO THREAD FOR SIMILAR ISSUE.

How to change environment variables in ECS/Fargate?

I have a Node.js app running on ECS/Fargate. I want to set some environment variables, and from what I've read, this should be done in the Task definition. However, there does not seem to be any way to edit environment variables after the task is initially defined. When I view the task, they are not editable, and there does not seem to be any way to edit the task. Is there a way to do this?
Container solutions are built to be immutable, which means any form of change, should force a new deployment. This leaves us with the option of retrieving the current TaskDefinition, updating its Environment variables, and updating the Service with the new definition:
aws ecs describe-task-definition --task-definition my_task_def
This retrieves the ACTIVE Task Definition. From here you can update Environment variables and register a new Task Definition:
aws ecs register-task-definition \
--cli-input-json file://<path_to_json_file>/task_def.json
Then Update the service
aws ecs update-service --service my-service --task-definition my_task_def
This will pick up ACTVE Task Definition.
I used CLI for illustration but using SDKs like Boto3 might be much easier handling JSON.

Executing Pentaho transformation(ktr) using node js with Pentaho CE

I can able to successfully execute the .ktr files using browser and as well as using postman tool by using below url
http://localhost:8089/kettle/executeTrans/?trans=D:\Pentaho\ktr\MyJson_to_Database.ktr
But I want to automate the process and this ktr and it need to accept a json file as input(right now the json data is in side the ktr file itself). As I am using NodeJS to automate the ktr executing processing, I am trying to use wreck and post method to execute it(I am new to wreck), I am facing difficulties to identify the problem whether the error is due to wrek or kettle transformation itself
In the mean time I am trying to execute it without passing path as query string in url and instead I want to use it in body, I have searched google with no success so far.
EDIT 1
I am able to reach to the ktr file from NodeJS Microservice and now the challenge is to read the file path inside docker image.
Could you work storing the json data in a file, and modifying/adding the transformation to read the file and pass the information in the file?

Reading yaml properties file from S3

I have a yaml properties file stored in a S3 bucket. In Mule4 I can read this file using S3 connector. I need to use properties defined in this file (for dynamic values reading and using it in Mule4) in DB connectors. I am not able to create properties from this file such that I can use them as ${dbUser} in mule configuration or flow as an example. Any guidance on how can I accomplish this?
You will not be able to use the S3 connector to do that. The connector can read the file in an operation at execution time, but properties placeholders, like ${dbUser} have to be defined earlier, at deployment time.
You might be able to to read the value into a variable (for example: #[vars.dbUser]) and use the variable in the database connector configuration. That is called a dynamic configuration, because it is evaluated dynamically at execution time.

how to store auto generated files in a different AWS S3 folder while running Tableau using Athena connector?

I am using Athena to connect a single csv file stored in AWS S3 folder with Tableau Desktop and have been successful in connecting the S3 data using Athena.
However, when I perform any activity in Tableau like drag and drop, slice and dice, for each activity, an auto generated csv and a metadata gets saved in the same folder as my input file.
Due to this additional files getting auto-generated in the same input file folder, the visuals in Tableau also get affected (due to additional records).
How do i ensure that, for any activity i perform in Tableau, the auto-generated files get stored in a different folder (rather than the same folder from where the input file is being called) ?
This will solve my problem as the visuals and the analysis will show correct numbers.
Currently, the work-around that I am using is after every activity I perform in Tableau (slice,filter, etc..), i go back to the S3 folder, delete the additional files that got auto-generated, then continue with activity in Tableau, then back to S3 folder for deletion, etc... (Definitely not the ideal way).
While executing Athena query, I am storing the query results in a different folder, because there is a provision for doing the same.
Please suggest if there is a similar provision for storing the auto-generated files (while working on Tableau) in a different folder ?
P.S. If there is an option of preventing these files from getting generated, that will also be helpful.
Anand
How do I ensure that the auto-generated files get stored in a different folder?
In order to store results of you queries in a different location, you need to specify different path for S3 Staging Directory. In order to do that, you need to Edit Connection to AWS Athena.
Here we did everything within Tableau itself, but the same result can be accomplished within AWS Athena settings for query result locaion
If there is an option of preventing these files from getting generated, that will also be helpful.
On the left side of the toolbar, there is an option Pause/Resume Auto Updates. When paused, Tableau doesn't send new query to AWS Athena.