I wanted to output json data not as array object and I did the changes mentioned in the pentaho document, but the output is always array even for the single set of values. I am using PDI 9.1 and I tested using the ktr from the below link
https://wiki.pentaho.com/download/attachments/25043814/json_output.ktr?version=1&modificationDate=1389259055000&api=v2
below statement is from https://wiki.pentaho.com/display/EAI/JSON+output
Another special case is when 'Nr. rows in a block' = 1.
If used with empty json block name output will looks like:
{
"name" : "item",
"value" : 25
}
My output comes like below
{ "": [ {"name":"item","value":25} ] }
I have resolved myself. I have added another JSON input step and defined as below
$.wellDesign[0] to get the array as string object
Related
I log messages that are JSON objects. The JSON has an array that contains key/value pairs:
{
...
"arr": [{"key": "foo", "value": "bar"}, ...],
...
}
Now I want to filter results that contains a specific key and extract the values for a specific key in the array.
I've tried using regex, something like parse #message /.*"key":"my_specific_key","value":(?<value>.*}).*/ which extracts the value but also returns the rest of the message. Also it doesn't filter the results.
How can I filter results and extract the values for a specific key?
If in your log entry in the cloudwatch log group they are actually showing up as json, you can just reference the key directly in any place you would a field.
(don't need the #, cloudwatch appends that automatically to all default values)
If you are using python, you can use aws_lambda_powertools to do this as well, in a very slick way (and its an actual aws product)
If they are showing up in your log as a string, then it may be an escaped string and you'll have to match it -exactly- - including spaces and what not. when you parse, you will want to do something like this:
if this is the string of your log message '{"AKey" : "AValue", "Key2" : "Value2"}
parse #message "{\"*\" : \"*\",\"*\" : \"*\"} akey, akey_value, key2, key2_value
then you can filter or count or anything against those variables. parse is specifically a statement to match a pattern and assign the wildcard to a variable, one at a time in order
tho with a complex json, if your above regex works than all you need is a filter statement
field #message
| pares #message ... your regex as value_var
| filer value_var /some more regex/
if its not a string in the log entry, but an actual json, you can just reference against the key:
filter a_key ~="some value" (or regex here)
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_AnalyzeLogData-discoverable-fields.html
for more info
I am trying to copy data from a csv file to a sql table in Azure Data Factory
This is my type property for the CSV file
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "2020-09-16-stations.csv",
"container": "container"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
I recieve following error:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source '2020-09-16-stations.csv' with row number 2: found more columns than expected column count 11.,Source=Microsoft.DataTransfer.Common,'
This is row #2
0e18d0d3-ed38-4e7f,Station2,Mainstreet33,,12207,Berlin,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}"
I think the last column, the JSON query is making trouble in this case. When I view the data it looks fine:
I thought exactly the "quoteChar": "\""would prevent that the last column makes problems. I have no idea why I am getting this error while i run debug
Try setting the escape character = " (a double quote). This should treat each pair of double quotes as an actual single quote and wont consider them as a "Quote Char" within the string, so you will end up with a string that looks like this (and which the system knows is a single string and not something it has to split):
{"openingTimes":[{"applicable_days":96,"periods":[{"startp":"08:00","endp":"20:00"}]},
{"applicable_days":31,"periods":[{"startp":"06:00","endp":"20:00"}]}]}
This is because this value "{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}" contains several comma and your columnDelimiter is "," which leads to that value is split to several column. So you need to change your columnDelimiter.
I am saving a file to blob storage in Data factory V2, when I specify the location to save to I am calling the file (for example) file1 and it saves in blob as file1, no problem. But can I use the dynamic content feature to append the datetime to the filename so its something like file1_01-07-2019_14-30-00 ?(7th Jan 14:30:00 just in case its awkward to read). Alternatively, can I output the result (the filename) of the webhook activity to the next activity (the function)?
Thank you.
I couldn't get this to work without editing the copy pipeline JSON file directly (late 2018 - may not be needed anymore). You need dynamic code in the copy pipeline JSON and settings defined in the dataset for setting filename parameters.
In the dataset define 'Parameters' for folder path and/or filename (click '+ New' and give them any name you like) e.g. sourceFolderPath, sourceFileName.
Then in dataset under 'Connection' include the following in the 'File path' definition:
#dataset().sourceFolderPath and #dataset().sourceFileName either side of the '/'
(see screenshot below)
In the copy pipeline click on 'Code' in the upper right corner of pipeline window and look for the following code under the 'blob' object you want defined by a dynamic filename - it the 'parameters' code isn't included add it to the JSON and click the 'Finish' button - this code may be needed in 'inputs', 'outputs' or both depending on the dynamic files you are referencing in your flow - below is an example where the output includes the date parameter in both folder path and file name (the date is set by a Trigger parameter):
"inputs": [
{
"referenceName": "tmpDataForImportParticipants",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "StgParticipants",
"type": "DatasetReference",
"parameters": {
"sourceFolderPath": {
"value": <derived value of folder path>,
"type": "Expression"
},
"sourceFileName": {
"value": <derived file name>,
"type": "Expression"
}
}
}
]
Derived value of folder path may be something like the following - this results in a folder path of yyyy/mm/dd within specified blobContainer:
"blobContainer/#{formatDateTime(pipeline().parameters.windowStart,'yyyy')}/#{formatDateTime(pipeline().parameters.windowStart,'MM')}/#{formatDateTime(pipeline().parameters.windowStart,'dd')}"
or it could be hardcoded e.g. "blobContainer/directoryPath" - don't include '/' at start or end of definition
Derived file name could be something like the following:
"#concat(string(pipeline().parameters.'_',formatDateTime(dataset().WindowStartTime, 'MM-dd-yyyy_hh-mm-ss'))>,'.txt')"
You can include any parameter set by the Trigger e.g. an ID value, account name, etc. by including pipeline().parameters.
Dynamic Dataset Parameters example
Dynamic Dataset Connection example
Once you set up the copy activity and select you blob dataset as the sink, you need to put in a value for the WindowStartTime, this can either just be a timestamp e.g. 1900-01-01T13:00:00Z or you can put in a pipeline parameter into this.
Having a parameter would maybe be more helpful if you're setting up a schedule trigger, as you will be able to input this WindowStartTime timestamp by when the trigger runs. For this you would use #trigger().scheduledTime as the value for the trigger parameter WindowStartTime.
https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#trigger-type-comparison
You can add a dataset parameter such an WindowStartTime, which is in the format 2019-01-10T13:50:04.279Z. Then you would have something like below for the dynamic filename:
#concat('file1_', formatDateTime(dataset().WindowStartTime, 'MM-dd-yyyy_hh-mm-ss')).
To use in the copy activity you will also need to add a pipeline parameter.
I am using JMETER to test a web app.
First I perform a http GET request which returns a JSON array such as:
[
{
"key1":
{
"subKey":
[
9.120968,
39.255417
]
},
key2 : 1
},
{
"key1":
{
"subKey":
[
9.123852,
39.243237
]
},
key2 : 10
}
]
Basically I want to take randomly one element, take the elements of key1 and create 2 variables in JMeter that will be used for the next query (if randomly it is not possible than just the 1st element).
I tried using JSON Extractor with the following settings (the example shows a single variable case):
and in the next http GET request referencing the parameter as ${var1}.
How to set JSON Extractor to extract a value, save into a JMeter variable to be used in the next http GET request?
Correct JSON Path query would be something like:
$..key1.subKey[${__Random(0,1,)}]
You need to switch Apply to value either to Main sample only or to Main sample and sub-samples
In the above setup:
Match No: 0 - tells JMeter to get random value from key1 subkey
${__Random(0,1,)} - gets a random element from the array, i.e. 9.120968 or 39.255417
More information:
Jayway Jsonpath
API Testing With JMeter and the JSON Extractor
"JMeter variable name to use" option that you've switched on there means that you'd be examining the content of this variable INSTEAD of Sample result.
So the fix is obvious: if you intend to extract whatever you extracting from Sample result - change it back to it.
PS If you intend the opposite (process the variable content, not the sample result) - let me know please.
my question is the following:
Let's say I have a json file that I want to load into big query.
It contains these two lines of data.
{"value":"123"}
{"value": 123 }
I have defined the following schema for my data.
[
{ "name":"value", "type":"String"}
]
When I try to load the json file into big query it will fail with the following error:
Field:value: Could not convert value to string
Is there a way to get around this issue other than transforming the data in the json file?
Thanks!
You can set the maxBadRecords property on the load job to skip a number of errors but still load the data.
Following your example, you could still load the data if you set it as:
"configuration": {
"load": {
"maxBadRecords": 1,
}
}
This is a way to get around the issue while still loading your JSON data into the table, just that the erroneous rows will be skipped. If loading a list of files, you could set it to be a function of the number of files that you are loading (e.g. maxBadRecords = 20 * fileCount)