Dynamically Append datetime to filename during copy activity or when specifying name in blob dataset - blob

I am saving a file to blob storage in Data factory V2, when I specify the location to save to I am calling the file (for example) file1 and it saves in blob as file1, no problem. But can I use the dynamic content feature to append the datetime to the filename so its something like file1_01-07-2019_14-30-00 ?(7th Jan 14:30:00 just in case its awkward to read). Alternatively, can I output the result (the filename) of the webhook activity to the next activity (the function)?
Thank you.

I couldn't get this to work without editing the copy pipeline JSON file directly (late 2018 - may not be needed anymore). You need dynamic code in the copy pipeline JSON and settings defined in the dataset for setting filename parameters.
In the dataset define 'Parameters' for folder path and/or filename (click '+ New' and give them any name you like) e.g. sourceFolderPath, sourceFileName.
Then in dataset under 'Connection' include the following in the 'File path' definition:
#dataset().sourceFolderPath and #dataset().sourceFileName either side of the '/'
(see screenshot below)
In the copy pipeline click on 'Code' in the upper right corner of pipeline window and look for the following code under the 'blob' object you want defined by a dynamic filename - it the 'parameters' code isn't included add it to the JSON and click the 'Finish' button - this code may be needed in 'inputs', 'outputs' or both depending on the dynamic files you are referencing in your flow - below is an example where the output includes the date parameter in both folder path and file name (the date is set by a Trigger parameter):
"inputs": [
{
"referenceName": "tmpDataForImportParticipants",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "StgParticipants",
"type": "DatasetReference",
"parameters": {
"sourceFolderPath": {
"value": <derived value of folder path>,
"type": "Expression"
},
"sourceFileName": {
"value": <derived file name>,
"type": "Expression"
}
}
}
]
Derived value of folder path may be something like the following - this results in a folder path of yyyy/mm/dd within specified blobContainer:
"blobContainer/#{formatDateTime(pipeline().parameters.windowStart,'yyyy')}/#{formatDateTime(pipeline().parameters.windowStart,'MM')}/#{formatDateTime(pipeline().parameters.windowStart,'dd')}"
or it could be hardcoded e.g. "blobContainer/directoryPath" - don't include '/' at start or end of definition
Derived file name could be something like the following:
"#concat(string(pipeline().parameters.'_',formatDateTime(dataset().WindowStartTime, 'MM-dd-yyyy_hh-mm-ss'))>,'.txt')"
You can include any parameter set by the Trigger e.g. an ID value, account name, etc. by including pipeline().parameters.
Dynamic Dataset Parameters example
Dynamic Dataset Connection example

Once you set up the copy activity and select you blob dataset as the sink, you need to put in a value for the WindowStartTime, this can either just be a timestamp e.g. 1900-01-01T13:00:00Z or you can put in a pipeline parameter into this.
Having a parameter would maybe be more helpful if you're setting up a schedule trigger, as you will be able to input this WindowStartTime timestamp by when the trigger runs. For this you would use #trigger().scheduledTime as the value for the trigger parameter WindowStartTime.
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#trigger-type-comparison

You can add a dataset parameter such an WindowStartTime, which is in the format 2019-01-10T13:50:04.279Z. Then you would have something like below for the dynamic filename:
#concat('file1_', formatDateTime(dataset().WindowStartTime, 'MM-dd-yyyy_hh-mm-ss')).
To use in the copy activity you will also need to add a pipeline parameter.

Related

Need Pentaho JSON without array

I wanted to output json data not as array object and I did the changes mentioned in the pentaho document, but the output is always array even for the single set of values. I am using PDI 9.1 and I tested using the ktr from the below link
https://wiki.pentaho.com/download/attachments/25043814/json_output.ktr?version=1&modificationDate=1389259055000&api=v2
below statement is from https://wiki.pentaho.com/display/EAI/JSON+output
Another special case is when 'Nr. rows in a block' = 1.
If used with empty json block name output will looks like:
{
"name" : "item",
"value" : 25
}
My output comes like below
{ "": [ {"name":"item","value":25} ] }
I have resolved myself. I have added another JSON input step and defined as below
$.wellDesign[0] to get the array as string object

Is it possible to add the description or other custom field to query result log?

I have the following scheduled query in combination with a TLS plugin logger.
"vssadmin.exe": {
"query": "select * from file WHERE directory = 'C:\\Windows\\Prefetch\\' and filename like '%vssadmin%';",
"interval": 600,
"description": "Vssadmin Execute, usaullay used to execute activity on Volume Shadow copy",
"platform": "windows"
},
I'd like to add the description field to the result output log of this specific query, so I can use it to map my queries to a framework. Unfortunately the provided documentation doesn't state such option. Is it possible to add the description or other custom field to the logged output?
Like this?
Tag your #osquery queries/logs with MITRE ATT&CK IDs like so:
SELECT username,shell, 'T1136' AS attckID FROM users;

JMeter pass JSON response value to next request

I am using JMETER to test a web app.
First I perform a http GET request which returns a JSON array such as:
[
{
"key1":
{
"subKey":
[
9.120968,
39.255417
]
},
key2 : 1
},
{
"key1":
{
"subKey":
[
9.123852,
39.243237
]
},
key2 : 10
}
]
Basically I want to take randomly one element, take the elements of key1 and create 2 variables in JMeter that will be used for the next query (if randomly it is not possible than just the 1st element).
I tried using JSON Extractor with the following settings (the example shows a single variable case):
and in the next http GET request referencing the parameter as ${var1}.
How to set JSON Extractor to extract a value, save into a JMeter variable to be used in the next http GET request?
Correct JSON Path query would be something like:
$..key1.subKey[${__Random(0,1,)}]
You need to switch Apply to value either to Main sample only or to Main sample and sub-samples
In the above setup:
Match No: 0 - tells JMeter to get random value from key1 subkey
${__Random(0,1,)} - gets a random element from the array, i.e. 9.120968 or 39.255417
More information:
Jayway Jsonpath
API Testing With JMeter and the JSON Extractor
"JMeter variable name to use" option that you've switched on there means that you'd be examining the content of this variable INSTEAD of Sample result.
So the fix is obvious: if you intend to extract whatever you extracting from Sample result - change it back to it.
PS If you intend the opposite (process the variable content, not the sample result) - let me know please.

How can I create dynamic destination files name based on what is filtered?

For example if in my log line appears something like that [xxx], I must put this message in a file with a name starting as xxx.log
And if the message changes and appears [xxy] I must create a new log file named as xxy.log
How can I do that in a syslog-ng config file?
to filter for specific messages, you can use filter expressions in syslog-ng:
You can use regular expressions in the filter as well.
To use the results of the match in the filename, try using a named pattern in the filter expression:
filter f_myfilter {message("(?<name>pattern)");};
Then you can use the named match in the destination template:
destination d_file {
file ("/var/log/${name}.log");
};
Let me know if it works, I haven't had the time to test it.
I find this way to resolve mi problem.
parser p_apache {
csv-parser(columns("MY.ALGO", "MY.MOSTRAR", "MY.OTRA")
delimiters("|")
);
};
destination d_file {
file("/var/log/syslog-ng/$YEAR-$MONTH/$DAY/messages-${MY.ALGO:-nouser}.log");
};
Regex is the answer here.
Eg: I have a file name access2018-10-21.log for source so my access log source file entry becomes
file("/opt/liferay-portal-6.2-ee-sp13/tomcat-7.0.62/logs/access[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9][0-9].log" follow_freq(1) flags(no-parse));

aws data pipeline datetime variable

I am using AWS Data Pipeline to save a text file to my S3 bucket from RDS. I would like the file name to to have the date and the hour in the file name like:
myfile-YYYYMMDD-HH.txt
myfile-20140813-12.txt
I have specified my S3DataNode FilePath as:
s3://mybucketname/out/myfile-#{format(myDateTime,'YYYY-MM-dd-HH')}.txt
When I try to save my pipeline I get the following error:
ERROR: Unable to resolve myDateTime for object:DataNodeId_xOQxz
According to the AWS Data Pipeline documentation for date and time functions this is the proper syntax for using the format function.
When I save pipeline using a "hard-coded" the date and time I don't get this error and my file is in my S3 bucket and folder as expected.
My thinking is that I need to define "myDateTime" somewhere or use a NOW()
Can somebody tell me how to set "myDateTime" to the current time (e.g. NOW) or give a workaround so I can format the current time to be used in my FilePath?
I am not aware of an exact equivalent of NOW() in Data Pipeline. I tried using makeDate with no arguments (just for fun) to see if that worked.. it did not.
The closest are runtime variables scheduledStartTime, actualStartTime, reportProgressTime.
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-s3datanode.html
The following for eg. should work.
s3://mybucketname/out/myfile-#{format(#scheduledStartTime,'YYYY-MM-dd-HH')}.txt
Just for fun, here is some more info on Parameters.
At the end of your Pipeline Json (click List Pipelines, select into one, click Edit Pipeline, then click Export), you need to add a Parameters and/or Values object.
I use a myStartDate for backfill processes which you can manipulate once it is passed in for ad hoc runs. You can give this a static default, but can't set it to a dynamic value so it is limited for regular schedule tasks. For realtime/scheduled dates, you need to use the #scheduledStartTime, etc, as suggested. Here is a sample of setting up some Parameters and or Values. Both show up in Parameters in the UI. These values can be used through out your pipeline activities (shell, hive, etc) with the #{myVariableToUse} notation.
"parameters": [
{
"helpText": "Put help text here",
"watermark": "This shows if no default or value set",
"description": "Label/Desc",
"id": "myVariableToUse",
"type": "string"
}
]
And for Values:
"values": {
"myS3OutLocation": "s3://some-bucket/path",
"myThreshold": "30000",
}
You cannot add these directly in the UI (yet) but once they are there you can change and save the values.