Remove a known subset of nodes from JSON, in SQLAzure - sql

Once upon a time, there was one row of data (massively simplified, the actual json data is 10KB+) thus:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
A process was supposed to write another record with predominantly different data:
ID, json
2, '{
"a1.arr": [1,2,3], //from ID 1
"a2.arr": [4,5,6], //new (and so are all below)
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
But due to some external error, the original set of json from ID 1 ended up also being represented in ID 2, so now the table looks like this:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
2, '{
"a1.arr": [1,2,3],
"a1.boo": true, //extraneous
"a1.str": "hello", //extraneous
"a1.num": 123, //extraneous
"a2.arr": [4,5,6],
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
I'd like to know if there's a way to remove the extraneous lines from the ID 2 record.
I believe that the entire JSON string from ID 1 is represented in ID 2 as a contiguous block, so string replacement could work but there's a chance that some reordering has taken place. Gets a bit messy with the element that is supposed to remain, though
There's also a chance that some of the a1.* nodes' values have been changed slightly, (I didn't do a diff) but I'm happy to use just the node names, not their values, in assessing whether an node should be removed. One of the nodes (a1.arr) should be kept in ID 2. The resultset should hence look like:
ID, json
1, '{
"a1.arr": [1,2,3],
"a1.boo": true,
"a1.str": "hello",
"a1.num": 123
}'
2, '{
"a1.arr": [1,2,3],
"a2.arr": [4,5,6],
"a2.boo": false,
"a2.str": "goodbye",
"a2.num": 456
}'
I've started playing about with https://dba.stackexchange.com/questions/168303/can-sql-server-2016-extract-node-names-from-json to get the list of node names from ID 1 that I want to remove from ID 2, just not sure how I then strip them out of ID 2's JSON - presumably a deserialize, reduce and reserialize sequence?

You can follow this approach:
get the keys you want to replace with an openjson on row with id=1 value
use cross apply to filter keys in rows with id>1
rebuild the json string without unwanted keys using STRING_AGG and group by
This code should work:
declare #tmp table ([id] int, jsonValue nvarchar(max))
declare #source_json nvarchar(max)
insert into #tmp values
(1, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123}')
,(2, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123,"a2.arr":[4,5,6],"a2.boo":false,"a2.str":"goodbye","a2.num":456}')
,(3, '{"a1.arr":[1,2,3],"a1.boo":true,"a1.str":"hello","a1.num":123,"a3.arr":[4,5,6],"a3.boo":false,"a3.str":"goodbye","a3.num":456}')
--get json string from id=1
select #source_json = jsonValue from #tmp where [id] = 1
--rebuild json string for id > 1 removing keys from id = 1
select t.[id],
'{' + STRING_AGG( '"' + g.[key] + '":"' + STRING_ESCAPE(g.[value], 'json') + '"', ',') + '}' as [json]
from #tmp t cross apply openjson(jsonValue) g
where t.id > 1
and g.[key] not in (select [key] from openjson(#source_json) where [key] <> 'a1.arr')
group by t.id
Result:

Related

Get JSON object keys as array in Presto/Trino

I have JSON data like this in one of my columns
{"foo": 1, "bar": 2}
{"foo": 1}
and I would like to run a query that returns the keys as an array
foo,bar
foo
Convert your JSON into a MAP and then use map_keys():
-- sample data
WITH dataset(js) as (
VALUES (JSON '{"foo": 1, "bar": 2}'),
(JSON '{"foo": 1}')
)
-- query
SELECT array_join(map_keys(CAST(js AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Use json_parse() if your JSON column is of type VARCHAR
SELECT array_join(map_keys(CAST(json_parse(js) AS MAP(VARCHAR, JSON))), ', ')
FROM dataset
Output:
_col0
bar, foo
foo
I'm not sure how to work well with JSON, but if we convert the JSON to a MAP, the process is simple using map_values:
WITH data as (SELECT * FROM (VALUES JSON '{"foo": 1, "bar": 2}', JSON '{"foo": 1}') AS t(json_col))
SELECT map_values(CAST(json_col AS MAP(VARCHAR, INTEGER))) json_col
FROM data
Output:
json_col
{2,1}
{1}

Json Arrays of objects PostgreSQL Table format

I have a JSON file (array of objects) which I have to convert into a table format using a PostgreSQL query.
Follow Sample Data.
"b", "c", "d", "e" are to be extracted as separate tables as they are arrays and in these arrays, there are objects
I have tried using json_populate_recordset() but it only works if I have a single array.
[{a:"1",b:"2"},{a:"10",b:"20"}]
I have referred to some links and codes.
jsonb_array_element example
postgreSQL functions
Expected Output
Sample Data:
{
"b":[
{columnB1:value, columnB2:value},
{columnB1:value, columnB2:value},
],
"c":[
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value},
{columnC1:value, columnC2:value, columnC3:value}
],
"d":[
{columnD1:value, columnD2:value},
{columnD1:value, columnD2:value},
],
"e":[
{columnE1:value, columnE2:value},
]
}
expected output
b should be one table in which columnA1 and columnA2 are displayed with their values.
Similarly table c, d, e with their respective columns and values.
Expected Output
You can use jsonb_to_recordset() but you need to unnest your JSON. You need to do this inline as this is a JSON Processing Function which cannot used derived values.
I am using validated JSON as simplified and formatted at end of this answer
To unnest your JSON use below notation which extracts JSON object field with the given key.
--one level
select '{"a":1}'::json->'a'
result : 1
--two levels
select '{"a":{"b":[2]}}'::json->'a'->'b'
result : [2]
We now expand this to include json_to_recordset()
select * from
json_to_recordset(
'{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b' --inner table b
)
as x("f1" int, "f2" int); --fields from table b
or using json_array_elements. Either way we need to list our fields. With second solution type will be json not int so you cant sum etc
with b as (select json_array_elements('{"a":{"b":[{"f1":2,"f2":4},{"f1":3,"f2":6}]}}'::json->'a'->'b') as jx)
select jx->'f1' as f1, jx->'f2' as f2 from b;
Output
f1 f2
2 4
3 6
We now use your data structure in jsonb_to_recordset()
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'b') as x(columnname1b text, columnname2b text);
Output:
columnname1b columnname2b
value1b value2b
value value
For table c
select * from jsonb_to_recordset( '{"a":{"b":[{"columnname1b":"value1b","columnname2b":"value2b"},{"columnname1b":"value","columnname2b":"value"}],"c":[{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"},{"columnname1":"value","columnname2":"value"}]}}'::jsonb->'a'->'c') as x(columnname1 text, columnname2 text);
Output
columnname1 columnname2
value value
value value
value value
Sample JSON
{
"a": {
"b": [
{
"columnname1b": "value1b",
"columnname2b": "value2b"
},
{
"columnname1b": "value",
"columnname2b": "value"
}
],
"c": [
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
},
{
"columnname1": "value",
"columnname2": "value"
}
]
}
}
Well, I came up with some ideas, here is one that worked. I was able to get one table at a time.
https://www.postgresql.org/docs/9.5/functions-json.html
I am using json_populate_recordset.
The column used in the first select statement comes from a table whose column is a JSON type which we are trying to extract into a table.
The 'tablename from column' in the json_populate_recordset function, is the table we are trying to extract followed with b its columns and datatypes.
WITH input AS(
SELECT cast(column as json) as a
FROM tablename
)
SELECT b.*
FROM input c,
json_populate_recordset(NULL::record,c.a->'tablename from column') as b(columnname1 datatype, columnname2 datatype)

How to parse this array of dicts and extract key columns from a big query external table

I have this Gaint Array of (dicts) loaded from a Json in a date partitioned big query external table with table structure as below as
Field name
Type.
Mode
meta
Record
Nullable
Messages
String
Repeated
date
Integer
Nullable
Every "Messages" Field is in its own row/record in my Bigquery table (New_line_delimited_Json)
I am trying to parse the "messages" field/column to extract some fields Key1 and Key2 which happens to be inside an Array (of dicts). For sake of simplicity ,below is the snippet of json of which "messages" is a field that I am trying to unnest/explode.
Ignore this schema;updated schema below***
[
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}"
],
"date": "20220110"
},
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-11",
"Key2":"H21257062"
}"
],
"date": "20220111"
}
]
updated schema on 01/17
{
"meta": {
"table": "FEED",
"source": "CP1"
},
"Messages": [
"{
"Key1":"2022-01-10",
"Key2":"H21257061"
}",
"{
"Key1":"2022-01-10",
"Key2":"H21257062"
}"
],
"date": "20220110"
},
updated schema representation on 01/17:
so far I have tried this but I am getting sql output of key1 and Key2 as Nulls
WITH table AS (SELECT Messages as array_column FROM `project.dataset.table` )
SELECT
json_extract_scalar(flattened_array, '$.Messages.key1') as key1,
json_extract_scalar(flattened_array, '$.Messages.key2') as key2
FROM table t
CROSS JOIN UNNEST(t.array_column) AS flattened_array
Still a little ambiguous so I assume below correctly represents your table (at least it matches the structure/schema in your question)
If my assumption correct - consider below approach
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
t.messages as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
If / when applied to above sample data - output is
Edit: Trying to help you further - Ok so, looks like your table looks like below
In this case - try below (quite light modification of previous version)
select * except(id) from (
select to_json_string(t) id, kv[offset(0)] as key, kv[safe_offset(1)] as value
from your_table t,
unnest(regexp_extract_all(messages, r'"[^"]+":"[^"]+"')) as message,
unnest([struct( split(translate(message, '"', ''), ':') as kv)])
)
pivot (min(value) for key in ('Key1', 'Key2'))
with output
But obviously, I would use below simplest approach
select
json_extract_scalar(messages, '$.Key1') as Key1,
json_extract_scalar(messages, '$.Key2') as Key2
from your_table
SELECT
JSON_QUERY(message,"$.Key1") as Key1,
JSON_QUERY(message,"$.Key2") as Key2
FROM
`project.dataset.table` as table
CROSS JOIN UNNEST(table.Messages) as message
CROSS JOIN for flattening the array,which will return a row for each message.
After that “JSON_QUERY” to extract the needed values from the JSON string.

Using BOTH scalar values and JSON objects as JSON values

I have a local table variable that I'm trying to populate with JSON key-value pairs. Sometimes the values are themselves JSON strings
DECLARE #Values TABLE
(
JsonKey varchar(200),
JsonValue varchar(max)
)
An example of what this ends up looking like:
+---------+--------------------------------------+
| JsonKey | JsonValue |
+---------+--------------------------------------+
| foo | bar |
| foo | [{"label":"fooBar","Id":"fooBarId"}] |
+---------+--------------------------------------+
After populating it, I attempt to build it all up into a single JSON string, like so:
DECLARE #Json JSON =
(
SELECT V.JsonKey as 'name',
V.JsonValue as 'value'
FROM #Values V
for json path
)
The problem with this is that it turns the JSON values into a string, rather than treating them as JSON. This results in those values not being parsed correctly.
An example of what it ends up looking like:
[
{
"name": "foo",
"value": "bar"
},
{
"name": "foo",
"value": "[{\"label\":\"fooBar\",\"Id\":\"fooBarId\"}]"
}
]
I am trying to get the JSON for the second value to NOT be escaped or wrapped in double quotes. What I would like to see is this:
[
{
"name": "foo",
"value": "bar"
},
{
"key": "foo",
"value": [
{
"label": "fooBar",
"Id": "fooBarId"
}
]
}
]
If that value will ONLY ever be JSON, I can instead use JSON_QUERY() in the JSON build-up, like this:
DECLARE #Json JSON =
(
SELECT V.JsonKey as 'name',
JSON_QUERY(V.JsonValue) as 'value'
FROM #Values V
for json path
)
Building it up like this gives me the result I want, but errors when the JsonValue column is not valid JSON. I attempted to put it in a case statement, to only use JSON_QUERY() when JsonValue was valid JSON, but since case statements are required to always output the same type, it turned it into a string again, and I got a repeat of the first example. I have not been able to find an elegant solution to this, and it really feels like there should be one that I'm just missing. Any help will be appreciated
One possible approach is to generate a statement with duplicate column names (JsonValue). By default FOR JSON AUTO does not include NULL values in the output, so the result is the expected JSON. Just note, that you must not use INCLUDE_NULL_VALUES in the statement or the final JSON will contain duplicate keys.
Table:
DECLARE #Values TABLE (
JsonKey varchar(200),
JsonValue varchar(max)
)
INSERT INTO #Values
(JsonKey, JsonValue)
VALUES
('foo', 'bar'),
('foo', '[{"label":"fooBar","Id":"fooBarId"}]')
Statement:
SELECT
JsonKey AS [name],
JSON_QUERY(CASE WHEN ISJSON(JsonValue) = 1 THEN JSON_QUERY(JsonValue) END) AS [value],
CASE WHEN ISJSON(JsonValue) = 0 THEN JsonValue END AS [value]
FROM #Values
FOR JSON AUTO
Result:
[{"name":"foo","value":"bar"},{"name":"foo","value":[{"label":"fooBar","Id":"fooBarId"}]}]

PostgreSQL: exclude complete jsonb array if one element fails the WHERE clause

Assume a table json_table with columns id (int), data (jsonb).
A sample jsonb value would be
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
When I use an SQL statement like the following:
SELECT data FROM json_table j, jsonb_array_elements(j.data#>'{a}') dt WHERE (dt#>>'{b,d}')::integer NOT IN (2,4,6,9) GROUP BY id;
... the two array elements are unnested and the one that qualifies the WHERE clause is still returned. This makes sense since each array element is considered individually. In this example I will get back the complete row
{"a": [{"b":{"c": "xxx", "d": 1}},{"b":{"c": "xxx", "d": 2}}]}
I'm looking for a way to exclude the complete json_table row when any jsonb array element fails the condition
You can move the condition to the WHERE clause and use NOT EXISTS:
SELECT data
FROM json_table j
WHERE NOT EXISTS (SELECT 1
FROM jsonb_array_elements(j.data#>'{a}') dt
WHERE (dt#>>'{b,d}')::integer IN (2, 4, 6, 9)
);
You can achieve it with the following query:
select data
from json_table
where jsonb_path_match(data, '!exists($.a[*].b.d ? ( # == 2 || # == 4 || # == 6 || # == 9))')