Replace double quotes to single quotes with awk - awk

BEGIN {
q = "\""
FS = OFS = q ", " q
}
{
split($1, arr, ": " q)
for(i in arr ) {
if(arr[i] == "name") {
gsub(q, "'", arr[i+1])
# print arr[1] ": " q arr[2], $2, $3
}
}
}
I have a json file, some data like this:
{"last_modified": {"type": "/type/datetime", "value": "2008-04-01T03:28:50.625462"}, "type": {"key": "/type/author"}, "name": "National Research Council. Committee on the Scientific and Technologic Base of Puerto Rico"s Economy.", "key": "/authors/OL2108538A", "revision": 1}
The name's value have a double quote, I only want to replace the double quote to single quote , not the all double quote, please tell me how to fix it?

awk '{for(i=1;i<=NF;i++) if($i~/name/){ gsub("\042","\047",$(i+1)) }}1' file

Related

Need an optimized way to get the required output

Is there an optimized way to trim the beginning and end blank spaces from the below data array field values, have used three approaches, but need a more optimized way.
Note: there might be more than 20 objects in the data array and more than 50 fields for each object. Below payload is just a sample; field values can have digits or strings or dates of any size.
{
"School": "XYZ High school",
"data": [
{
"student Name": "XYZ ",
"dateofAdmission": "2021-06-09 ",
"percentage": "89 "
},
{
"student Name": "ABC ",
"dateofAdmission": "2021-08-04 ",
"percentage": "90 "
},
{
"student Name": "PQR ",
"dateofAdmission": "2021-10-01 ",
"percentage": "88 "
}
]
}
Required output:
{
"School": "XYZ High school",
"data": [
{
"student Name": "XYZ",
"dateofAdmission": "2021-06-09",
"percentage": "89"
},
{
"student Name": "ABC",
"dateofAdmission": "2021-08-04",
"percentage": "90"
},
{
"student Name": "PQR",
"dateofAdmission": "2021-10-01",
"percentage": "88"
}
]
}
Three approaches I've used:
First approach:
%dw 2.0
output application/json
//variable to remove beginning and end blank spaces from values in key:value pairs for data
var payload1 = payload.data map ((value , key ) ->
value mapObject ( ($$ )) : trim($))
---
//constructed the payload
payload - "data" ++ data: payload1 map ((item, index) -> {
(item)
})
Second approach:
%dw 2.0
output application/json
---
payload - "data" ++ data: payload.data map ((value , key ) ->
value mapObject ( ($$ )) : trim($)) map ((item, index) -> {
(item)
})
Third approach:
%dw 2.0
output application/json
---
{
"Name":payload.School,
"data": payload.data map ( $ mapObject (($$):trim($) ) )
}
Another solution using the update operator:
%dw 2.0
output application/json
---
payload update {
case data at .data ->
data map ($ mapObject ((value, key, index) -> (key):trim(value) ))
}
Note that your first two solutions are exactly the same, and both have an unneeded map() at the end that doesn't seem to have any purposes. The third solution is very similar but uses an incorrect key name for the school name (Name instead of School as in the example). There's nothing particularly wrong with each solution other than those minor issues.

Dataweave2 Update function for few entries not working?

I want to update 3 fields in my Array of paylaod.
TotalSpendamount
price
lineAmount.
My script is as follows;
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity -> $ map {
($ update {
case .integrationEntityDetails.contractUtilization.items.item -> $ map {
($ update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
case totalSpendAmount at .integrationEntityDetails.contractUtilization -> totalSpendAmount update
{
case totalSpendAmount at .totalSpendAmount -> if ( totalSpendAmount as Number < 1 ) "0" ++ totalSpendAmount else totalSpendAmount
}
})
}
}
If I run above script, only totalspendAmount' is getting update.If I remove the 'totalspendAmount' case block, my 'price and lineamount fields are updating correctly.
What is wrong in my script?
My payload is;
{
"IntegrationEntities": {
"integrationEntity": [
{
"integrationEntityHeader": {
"integrationTrackingNumber": "XXXX",
"referenceCodeForEntity": "132804",
"additionalInfo": "ADDITIONALINFO"
},
"integrationEntityDetails": {
"contractUtilization": {
"externalId": "417145",
"utilizationType": "INVOICE",
"isDelete": "No",
"documentNumber": "132804",
"documentDescription": "",
"documentDate": "2021-03-26",
"totalSpendAmount": ".92",
"documentCurrency": "AUD",
"createdBy": "Oracle Integration",
"status": "FULLY PAID",
"items": {
"item": [
{
"lineItemId": "132804_1",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804",
"quantity": "1",
"price": ".92",
"lineAmount": ".92",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
},
{
"lineItemId": "132804_2",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804_2",
"quantity": "1",
"price": ".95",
"lineAmount": ".95",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
}
]
}
}
}
}
]
}
}
The output I look for is;
{
"IntegrationEntities": {
"integrationEntity": [
{
"integrationEntityHeader": {
"integrationTrackingNumber": "XXXX",
"referenceCodeForEntity": "132804",
"additionalInfo": "ADDITIONALINFO"
},
"integrationEntityDetails": {
"contractUtilization": {
"externalId": "417145",
"utilizationType": "INVOICE",
"isDelete": "No",
"documentNumber": "132804",
"documentDescription": "",
"documentDate": "2021-03-26",
"totalSpendAmount": "0.92",
"documentCurrency": "AUD",
"createdBy": "Oracle Integration",
"status": "FULLY PAID",
"items": {
"item": [
{
"lineItemId": "132804_1",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804",
"quantity": "1",
"price": "0.92",
"lineAmount": "0.92",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
},
{
"lineItemId": "132804_2",
"contractNumber": "YYYYYYY",
"contractLineId": "",
"lineNumber": "1",
"name": "132804",
"description": "132804_2",
"quantity": "1",
"price": "0.95",
"lineAmount": "0.95",
"purchaseOrderNumber": "YYYYYY",
"purchaseOrderDescription": ""
}
]
}
}
}
}
]
}
}
Try with this script:
%dw 2.0
output application/json
---
payload.IntegrationEntities.integrationEntity.integrationEntityDetails.contractUtilization map ((cu, index) -> cu update {
case .totalSpendAmount if ($ as Number < 1) -> "0" ++ $
case .items.item -> $ map {
($ update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
})
Updated Scripts:
Approach 1
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity -> $ map {
($ update {
case .integrationEntityDetails.contractUtilization-> $ update {
case .totalSpendAmount -> if ($ as Number < 1) "0" ++ $ else $
case .items.item -> $ map ((cuItem,index) -> cuItem update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
} )}
}
Approach 2
%dw 2.0
output application/json
---
payload update {
case .IntegrationEntities.integrationEntity[0].integrationEntityDetails.contractUtilization-> $ update {
case .totalSpendAmount -> if ($ as Number < 1) "0" ++ $ else $
case .items.item -> $ map ((cuItem,index) -> cuItem update {
case .price -> if ( $ as Number < 1 ) "0" ++ $ else $
case .lineAmount -> if ( $ as Number < 1 ) "0" ++ $ else $
})
}
}

Building a pandas condition query using a loop

I am having an object filters which gives me conditions to be applied to a dataframe as shown below:
"filters": [
{
"dimension" : "dimension1",
"operator" : "IN",
"value": ["value1", "value2", "value3"],
"conjunction": None
},
{
"dimension" : "dimension2",
"operator" : "NOT IN",
"value": ["value1", "value2", "value3"],
"conjunction": "OR"
},
{
"dimension" : "dimension3",
"operator" : ">=",
"value": ["value1", "value2", "value3"],
"conjunction": None
},
{
"dimension" : "dimension4",
"operator" : "==",
"value": ["value1", "value2", "value3"],
"conjunction": "AND"
},
{
"dimension" : "dimension5",
"operator" : "<=",
"value": ["value1", "value2", "value3"],
"conjunction": None
},
{
"dimension" : "dimension6",
"operator" : ">",
"value": ["value1", "value2", "value3"],
"conjunction": "OR"
},
]
Here is the grammar by which I used to build the SQL Query:
for eachFilter in filters:
conditionString = ""
dimension = eachFilter["dimension"]
operator = eachFilter["dimension"]
value = eachFilter["dimension"]
conjunction = eachFilter["dimension"]
if len(eachFilter["value"]) == 1:
value = value[0]
if operator != "IN" or operator != "NOT IN":
conditionString += f' {dimension} {operator} {value} {conjunction}'
else:
conditionString += f' {dimension} {operator} {value} ({conjunction})'
else:
value = ", ".join(value)
if operator != "IN" or operator != "NOT IN":
conditionString += f' {dimension} {operator} {value} {conjunction}'
else:
conditionString += f' {dimension} {operator} {value} ({conjunction})'
But when it comes to pandas I can't use such queries so wanted to know if there's a good way to loop these filter conditions based on the conditions given in filters. Note that these are the only conditions I will be operating through.
In case of None as conjunction it should have the conjunction as "AND".
I have used eval function to create nested eval statements for pandas conditional filtering and then used it at the end to evaluate them all as shown below:
for eachFilter in filtersArray:
valueString = ""
values = eachFilter[self.queryBuilderMap["FILTERS_MAP"]["VALUE"]]
dimension = eachFilter[self.queryBuilderMap["FILTERS_MAP"]["DIMENSION"]]
conjunction = self.defineConjunction(eachFilter[self.queryBuilderMap["FILTERS_MAP"]["CONJUNCTION"]])
if filterCheck==len(filtersArray) - 1:
conjunction = ""
if (eachFilter[self.queryBuilderMap["FILTERS_MAP"]["OPERATOR"]]).lower() == "in":
for eachValue in values:
valueString += f"(df['{dimension}'] == {eachValue}) {conjunction} "
evalString += valueString
elif (eachFilter[self.queryBuilderMap["FILTERS_MAP"]["OPERATOR"]]).lower() == "not in":
for eachValue in values:
valueString += f"(df['{dimension}'] != {eachValue}) {conjunction} "
evalString += valueString
else:
for eachValue in values:
valueString += f"(df['{dimension}'] {eachFilter[self.queryBuilderMap['FILTERS_MAP']['OPERATOR']]} {eachValue}) {conjunction} "
evalString += valueString
filterCheck += 1
print(valueString)
#print(evalString)
df = eval(f'df.loc[{evalString}]')
#print(df.keys())
return df
Here filtermap is the dictionary key value pair:
"FILTERS_MAP": {
"DIMENSION": "dimension",
"OPERATOR": "operator",
"VALUE": "value",
"CONJUNCTION": "conjunction",
"WRAPPER": "wrapper"
}

How to update deeply nested JSON object based on filter criteria in Postgres?

I have a table mapping_transform with a JSONB column content_json containing something like
{
"meta": {...},
"mapping": [
...,
{
"src": "up",
"dest": "down",
...
},
...
]
}
I want to add a new JSON entry ("rule_names": [ "some name" ]) to the JSON object matching src = up and dest = down, which would result in
{
"meta": {...},
"mapping": [
...,
{
"src": "up",
"dest": "down",
...,
"rule_names": [ "some name" ]
},
...
]
}
The following query returns the JSON object that meets the filter requirements:
WITH elems AS (SELECT json_array_elements(content_json->'mapping') from mapping_transform)
SELECT * FROM elems WHERE json_array_elements->>'src' = 'up' and json_array_elements->>'dest' = 'down';
-- Alternative
SELECT mt_entry
FROM mapping_transform,
LATERAL jsonb_array_elements(content_json::jsonb->'mapping') mt_entry
WHERE mt_entry->>'src' = 'up' and mt_entry->>'dest' = 'down';
My problem now is that I do not know how to add the new entry to the specific object. I tried something like
WITH elems AS (SELECT json_array_elements(content_json->'mapping') from mapping_transform),
results SELECT * FROM elems WHERE json_array_elements->>'src' = 'up' and json_array_elements->>'dest' = 'down'
UPDATE mapping_transform
SET content_json = jsonb_set(results, '{"rule_names"}', '["some name"]'); -- this does obviously not work
but that does not execute as results is an unknown column. I also do need to merge the result of the jsonb_set with the rest of the content_json before assigning to content_json, because otherwise it would override the whole content.
How can I update specific deeply nested JSON objects based on filter criteria?
If I had a well defined path as to where my object is that I want to update, things would be much easier. But as the target object lies within a JSON array and has an arbitrary position, finding and updating it is much more difficult.
If you are familiar with JavaScript you'll be happy to install and use JavaScript procedural language plv8. This extension allows you to modify json values natively, example:
create extension if not exists plv8;
create or replace function update_mapping_v8(data json)
returns json language plv8 as $$
var len = data['mapping'].length;
for (var i = 0; i < len; i++) {
var o = data['mapping'][i];
if (o.src == 'up' && o.dest == 'down') {
o.rule_names = 'some name'
}
}
return data;
$$;
update mapping_transform
set content_json = update_mapping_v8(content_json);
For MS Windows users: ready to install Windows binaries.
A plpgsql alternative solution uses jsonb type:
create or replace function update_mapping_plpgsql(data jsonb)
returns json language plpgsql as $$
declare
r record;
begin
for r in
select value, ordinality- 1 as pos
from jsonb_array_elements(data->'mapping') with ordinality
where value->>'src' = 'up' and value->>'dest' = 'down'
loop
data = jsonb_set(
data,
array['mapping', r.pos::text],
r.value || '{"rule_names": "some name"}'
);
end loop;
return data;
end $$;
update mapping_transform
set content_json = update_mapping_plpgsql(content_json::jsonb);
I build path here: concat('{mapping,',(ord::int-1),'}')::text[] and the rest is pretty same. Please note I join on text=text (because I dont know what is your PK - it is not recommended). left the value to update with, right original:
vao=# with num as (select content_json,val,ord from mapping_transform, json_array_elements(content_json->'mapping') with ordinality as o (val,ord) where val->>'src' = 'up')
select
jsonb_pretty(
jsonb_set(t.content_json::jsonb,concat('{mapping,',(ord::int-1),'}')::text[],((t.content_json->'mapping'->(ord::int-1))::jsonb||'{"rule_names":["some name"]}')::jsonb)
)
, jsonb_pretty(t.content_json::jsonb)
from mapping_transform t
join num on num.content_json::text = t.content_json::text
/* of course join should be on PK, not text representation*/
;
jsonb_pretty | jsonb_pretty
-----------------------------+----------------------------
{ +| { +
"meta": { +| "meta": { +
"a": true +| "a": true +
}, +| }, +
"mapping": [ +| "mapping": [ +
"a", +| "a", +
"c", +| "c", +
{ +| { +
"a": 0, +| "a": 0, +
"src": "up", +| "src": "up", +
"dest": "down",+| "dest": "down"+
"rule_names": [+| }, +
"some name"+| "b" +
] +| ] +
}, +| }
"b" +|
] +|
} |
{ +| { +
"meta": { +| "meta": { +
"a": true +| "a": true +
}, +| }, +
"mapping": [ +| "mapping": [ +
"a", +| "a", +
{ +| { +
"a": 0, +| "a": 0, +
"src": "up", +| "src": "up", +
"dest": "down",+| "dest": "down"+
"rule_names": [+| }, +
"some name"+| "b" +
] +| ] +
}, +| }
"b" +|
] +|
} |
(2 rows)
and the build:
vao=# create table mapping_transform(content_json jsonb);
CREATE TABLE
vao=# insert into mapping_transform select '{
"meta": {
"a": true
},
"mapping": ["a",{
"src": "up",
"dest": "down",
"a": 0
},
"b"
]
}';
INSERT 0 1
vao=# insert into mapping_transform select '{
"meta": {
"a": true
},
"mapping": ["a","c",{
"src": "up",
"dest": "down",
"a": 0
},
"b"
]
}';
INSERT 0 1

AWK not printing properly

This is what I need to do..
I have a textfile and parse it using awk. The output should be in json format. It should look like this:
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Now, this is how the content of textfile looks like:
Record X
Key1 is Value1, Key2 is Value2
Record Y
Key1 is Value1, Key2 is Value2
Record Z
Key1 is Value1, Key2 is Value2
Record A
Key1 is Value1, Key2 is Value2
I tried creating a script to produce the output that I want, I'm in the first part however Im already stuck with printing the line. This is my script:
awk
'BEGIN { print "{" }
{ if($0 ~ /^Record /){print "\"" $0 "\":" }}
END { print "}" }' myRecord.txt
And the output is this..
{
":ecord X
":ecord Y
":ecord Z
":ecord A
}
I do not understand why that kind of script will produce something like that.
Kindly tell me what's wrong. thank you!
Here is another awk without using getline
awk -F"[ ,]*" 'BEGIN {print "{"} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"},"} END {print "}"}'
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"},
}
If you get problems with last , you can do like this:
awk -F"[ ,]*" -v f=$(cat file | wc -l) 'BEGIN {print "{"} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"}"(NR==f?"":",")} END {print "}"}' file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Or all in only awk
awk -F"[ ,]*" 'BEGIN {print "{"} FNR==NR {f=NR;next} /^Record/ {a=$0;next} {print "\""a"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"}"(FNR==f?"":",")} END {print "}"}' file{,}
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Your main problem is that your input file was created on Windows and so has control-Ms at the end of each line causing corruption when printing the lines. Remove them with dos2unix or similar before running your script. Do NOT use any getline solution suggested below as that would be the wrong approach and introduce a lot of caveats and complexity (see http://awk.info/?tip/getline).
Try this:
$ cat tst.awk
BEGIN{ print "{" }
NR%2 { id = $0; next }
{
sub(/^ +/,"")
gsub(/ is /,"\":\"")
gsub(/, /,"\", \"")
printf "%s\"%s\" : { \"%s\"}", (c++?",\n":""), id, $0
}
END{ print "\n}" }
.
$ awk -f tst.awk file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"}
}
Using your flow logic:
awk 'BEGIN { print "{" }
/^Record /{
if (c){printf ",\n"}
printf("\"%s\":",$0);next}
{
gsub("is",":")
gsub(" *","\"")
printf(" {%s\"}",$0)
c++
}
END { print "\n}" }' infile
You could do this through awk's getline function,
$ awk 'BEGIN{printf "{\n"}/^Record/{var=$0; getline; w=$1; x=$3; y=$4; z=$6;}{printf "\""var"\"" " : { ""\""w"\""":\""x"\", \""y"\":\""z"\"},\n"} END{printf "}\n"}' file
{
"Record X" : { "Key1":"Value1,", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1,", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1,", "Key2":"Value2"},
"Record A" : { "Key1":"Value1,", "Key2":"Value2"},
}
Through GNU awk's gsub function,
$ awk -v RS="Record" 'BEGIN{print "{"} gsub(/\n/,"",$0){gsub(/.$/,"",$4); print "\""RS" "$1"\" : { \""$2"\":\""$4"\", \""$5"\":\""$7"\"},"} END{print "}"}' file
{
"Record X" : { "Key1":"Value1", "Key2":"Value2"},
"Record Y" : { "Key1":"Value1", "Key2":"Value2"},
"Record Z" : { "Key1":"Value1", "Key2":"Value2"},
"Record A" : { "Key1":"Value1", "Key2":"Value2"},
}