SQL to JSON - Grouping Results into JSON Array - sql

I am trying to come up with an SQL solution for arranging output to match an expected JSON format.
I have some simple SQL to highlight where the issue is coming from;
SELECT TOP 1 'Surname' AS 'name.family'
,'Forename, Middle Name' AS 'name.given'
,'Title' AS 'name.prefix'
,getDATE() AS 'birthdate'
,'F' AS 'gender'
,'Yes' AS 'active'
,'work' AS 'telecom.use'
,'phone' AS 'telecom.system'
,'12344556' AS 'telecom.value'
FROM tblCustomer
FOR json path
Which will return JSON as;
[
{
"name": {
"family": "Surname",
"given": "Forename, Middle Name",
"prefix": "Title"
},
"birthdate": "2019-02-13T12:06:45.490",
"gender": "F",
"active": "Yes",
"telecom": {
"use": "work",
"system": "phone",
"value": "12344556"
}
}
]
What I need to is to add extra objects into the "telecome" array so it would appear as;
[
{
"name": {
"family": "Surname",
"given": "Forename, Middle Name",
"prefix": "Title"
},
"birthdate": "2019-02-13T12:06:45.490",
"gender": "F",
"active": "Yes",
"telecom": {
"use": "work",
"system": "phone",
"value": "12344556"
},
{
"use": "work",
"system": "home",
"value": "12344556"
},
}
]
I have incorrectly assume I could keep adding to my SQL as follows;
SELECT TOP 1 'Surname' AS 'name.family'
,'Forename, Middle Name' AS 'name.given'
,'Title' AS 'name.prefix'
,getDATE() AS 'birthdate'
,'F' AS 'gender'
,'Yes' AS 'active'
,'work' AS 'telecom.use'
,'phone' AS 'telecom.system'
,'12344556' AS 'telecom.value'
,'home' AS 'telecom.use'
FROM tblCustomer
FOR json path
And it would nest the items as per my naming indents however;
Property 'telecom.use' cannot be generated in JSON output due to a
conflict with another column name or alias. Use different names and
aliases for each column in SELECT list.
Is there a way to handle this nesting with SQL or will I need to create separate for JSON queries and merge them?
Thanks
Using ##Version Microsoft SQL Server 2017 (RTM) - 14.0.1000.169 (X64)
Aug 22 2017 17:04:49 Copyright (C) 2017 Microsoft Corporation
Express Edition (64-bit) on Windows Server 2012 R2 Datacenter 6.3
(Build 9600: ) (Hypervisor)
Small edit to the question to use dynamic values rather than forced static members.
SELECT TOP 1 'Surname' AS 'name.family'
,'Forename, Middle Name' AS 'name.given'
,'Title' AS 'name.prefix'
,getDATE() AS 'birthdate'
,'F' AS 'gender'
,'Yes' AS 'active'
,'work' AS 'telecom.use'
,'phone' AS 'telecom.system'
,customerWorkTelephone AS 'telecom.value'
,'home' AS 'telecom.use'
,'phone' AS 'telecom.system'
,customerHomeTelephone AS 'telecom.value'
FROM tblCustomer
FOR json path
The "value" items will be taken from columns within the tblCustomer table. I've tried to make good on the responses below but cant get the logic quite correct in the sub query.
Thanks again
FURTHER EDIT
I have some SQL that is giving me the output I expect however I am not sure its the best that it could be, is my approach less than optimal?
SELECT TOP 1 [name.family] = 'Surname'
,[name.given] = 'Forename, Middle Name'
,[name.prefix] = 'Title'
,[birthdate] = GETDATE()
,[gender] = 'F'
,[active] = 'Yes'
,[telecom] = (
SELECT [use] = V.used
,[system] = 'phone'
,[value] = CASE V.used
WHEN 'work'
THEN cu.customerWorkTelephone
WHEN 'home'
THEN cu.customerHomeTelephone
when 'mobile'
then cu.customerMobileTelephone
END
FROM (
VALUES ('work')
,('home')
,('mobile')
) AS V(used)
FOR json path
)
FROM tblCustomer cu
FOR JSON PATH

Using a subselect with a few hard-coded rows:
SELECT TOP 1
'Surname' AS 'name.family'
,'Forename, Middle Name' AS 'name.given'
,'Title' AS 'name.prefix'
,getDATE() AS 'birthdate'
,'F' AS 'gender'
,'Yes' AS 'active'
,'telecom' = (
SELECT
'work' AS 'use'
,V.system AS 'system'
,'12344556' AS 'value'
FROM
(VALUES
('phone'),
('home')) AS V(system)
FOR JSON PATH)
FROM tblCustomer
FOR JSON PATH
Note the lack of the telecom. prefix inside the subquery.
Results (without the table reference):
[
{
"name": {
"family": "Surname",
"given": "Forename, Middle Name",
"prefix": "Title"
},
"birthdate": "2019-02-13T12:53:08.400",
"gender": "F",
"active": "Yes",
"telecom": [
{
"use": "work",
"system": "phone",
"value": "12344556"
},
{
"use": "work",
"system": "home",
"value": "12344556"
}
]
}
]
PD: Particularly for SQL Server I find using the alias on the left side more readable:
SELECT TOP 1
[name.family] = 'Surname',
[name.given] = 'Forename, Middle Name',
[name.prefix] = 'Title',
[birthdate] = GETDATE(),
[gender] = 'F',
[active] = 'Yes',
[telecom] = (
SELECT
[use] = 'work',
[system] = V.system,
[value] = '12344556'
FROM
(VALUES ('phone'), ('home')) AS V(system)
FOR JSON
PATH)
FROM tblCustomer
FOR JSON
PATH

SELECT
EMP.ID,
EMP.NAME,
DEP.NAME
FROM EMPLOYEE EMP INNER JOIN DEPARTMENT DEP ON EMP.DEPID=DEP.DEPID
WHERE EMP.SALARY>1000
FOR JSON PATH

Related

Querying over PostgreSQL JSONB column

I have a table "blobs" with a column "metadata" in jsonb data-type,
Example:
{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
]
}
I want to identify statuses in the "metas" array that match with certain, given strings. I have tried the following so far but without results:
SELECT * FROM blobs
WHERE metadata is not null AND
(
SELECT count(*) FROM jsonb_array_elements(metadata->'metas') AS cn
WHERE cn->>'status' IN ('active','reported')
) > 0;
It would also be sufficient if I could compare the string with "status" in the first array object.
I am using PostgreSQL 9.6.24
for some clarity I usually break code into series of WITH statements. My idea for your problem would be to use json path (https://www.postgresql.org/docs/12/functions-json.html#FUNCTIONS-SQLJSON-PATH) and function jsonb_path_query.
Below code gives a list of counts, I will leave the rest to you, to get final data.
I've added ID column just to have something to join on. Otherwise join on metadata.
Also, note additional " in where condition. Left join in blob_ext is there just to have null value if metadata is not present or that path does not work.
with blob as (
select row_number() over()"id", * from (VALUES
(
'{
"total_count": 2,
"items": [
{
"name": "somename",
"metadata": {
"metas": [
{
"id": "11258",
"score": 6.1,
"status": "active",
"published_at": "2019-04-20T00:29:00",
"nvd_modified_at": "2022-04-06T18:07:00"
},
{
"id": "9251",
"score": 5.1,
"status": "active",
"published_at": "2018-01-18T23:29:00",
"nvd_modified_at": "2021-01-08T12:15:00"
}
]
}
}
]}'::jsonb),
(null::jsonb)) b(metadata)
)
, blob_ext as (
select bb.*, blob_sts.status
from blob bb
left join (
select
bb2.id,
jsonb_path_query (bb2.metadata::jsonb, '$.items[*].metadata.metas[*].status'::jsonpath)::character varying "status"
FROM blob bb2
) as blob_sts ON
blob_sts.id = bb.id
)
select bbe.id, count(*) cnt, bbe.metadata
from blob_ext bbe
where bbe.status in ('"active"', '"reported"')
group by bbe.id, bbe.metadata;
A way is to peel one layer at a time with jsonb_extract_path() and jsonb_array_elements():
with cte_items as (
select id,
metadata,
jsonb_extract_path(jx.value,'metadata','metas') as metas
from blobs,
lateral jsonb_array_elements(jsonb_extract_path(metadata,'items')) as jx),
cte_metas as (
select id,
metadata,
jsonb_extract_path_text(s.value,'status') as status
from cte_items,
lateral jsonb_array_elements(metas) s)
select distinct
id,
metadata
from cte_metas
where status in ('active','reported');

How to Add Multiple JSON_BUILD_OBJECT entries to a JSON_AGG

I am using PostgreSQL 9.4 and I have a requirement to have an 'addresses' array which contains closed JSON objects for different types of address (residential and correspondence). The structure should look like this:
[
{
"addresses": [
{
"addressLine1": "string",
"type": "residential"
},
{
"addressLine1": "string",
"type": "correspondence"
}
],
"lastName": "string"
}
]
...and here's some example data to illustrate the desired result:
[
{
"addresses": [
{
"addressLine1": "54 ASHFIELD PADDOCK",
"type": "residential"
},
{
"addressLine1": "135 MERRION HILL",
"type": "correspondence"
}
],
"lastName": "WRIGHT"
},
{
"addresses": [
{
"addressLine1": "13 BOAKES GROVE",
"type": "residential"
},
{
"addressLine1": "46 BEACONSFIELD GRANGE",
"type": "correspondence"
}
],
"lastName": "DOHERTY"
}
]
This is where I've gotten to with my SQL:
SELECT
json_agg(
json_build_object('addresses',(SELECT json_agg(json_build_object('addressLine1',c2.address_line_1,
'type',c2.address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'addresses',(SELECT json_agg(json_build_object('addressLine1',c2.corr_address_line_1,
'type',c2.corr_address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'lastName',c.surname
)
) AS customer_json
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence addresses*/
...and this runs, however it repeats the 'addresses' object twice and has an array for each address variant, not around the overall array.
What I stupidly thought would work, is the following:
SELECT
json_agg(
json_build_object('addresses',(SELECT json_agg(json_build_object('addressLine1',c2.address_line_1,
'type',c2.address_type
),
json_build_object('addressLine1',c2.corr_address_line_1,
'type',c2.corr_address_type
)
)
FROM my_customer_table c2
WHERE c2.person_id=c.person_id
),
'lastName',c.surname
)
) AS customer_json
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence addresses*/
...however this throws an error:
"ERROR: function json_agg(json, json) does not exist.
LINE 3: json_build_object('addresses',(SELECT json_agg(json_build_o...
HINT: No function matches the given name and argument types. You might need to add explicit type casts."
I've Googled this, however no posts found seem to relate to the same kind of result I'm trying to get.
Does anyone know if it's possible to have multiple JSON_BUILD_OBJECT entries inside of an array?
A colleague has found a solution to this. It's totally different to the way I was approaching it, but works nicely. Here's the working code:
SELECT
json_agg(subquery.customer_json) AS customer_json
FROM
(
SELECT
row_to_json(t) AS customer_json
FROM (
SELECT
(
SELECT array_to_json(array_agg(addresses_union))
FROM (
SELECT
c.address_line_1 AS "addressLine1",
c.address_type as type
UNION ALL
SELECT
c.corr_address_Line_1 AS "addressLine1",
c.corr_address_type as type
) AS addresses_union
) as addresses,
c.surname AS "lastName"
FROM
my_customer_table c
WHERE
c.corr_address_type IS NOT NULL /*exclude customers without correspondence address*/
) t
) subquery

How to group multiple values to only two groups?

So, I have 2 tables.
Type table
id
Name
1.
General
2.
Mostly Used
3.
Low
Component table
id
Name
typeId
1.
Component 1
1
2.
Component 2
1
4.
Component 4
2
6.
Component 6
2
7.
Component 5
3
There can be numerous types but I want to get only 'General' and 'Others' as types along with the component as follows:
[{
"General": [{
"id": "1",
"name": "General",
"component": [{
"id": 1,
"name": "component 1",
"componentTypeId": 1
}, {
"id": 2,
"name": "component 2",
"componentTypeId": 1
}]
}],
"Others": [{
"id": "2",
"name": "Mostly Used",
"component": [{
"id": 4,
"name": "component 4",
"componentTypeId": 2
}, {
"id": 6,
"name": "component 6",
"componentTypeId": 2
}]
},
{
"id": "3",
"name": "Low",
"component": [{
"id": 7,
"name": "component 5",
"componentTypeId": 3
}]
}
]
}]
WITH CTE_TYPES AS (
SELECT
CASE WHEN t. "name" <> 'General' THEN
'Others'
ELSE
'General'
END AS TYPE,
t.id,
t.name
FROM
type AS t
GROUP BY
TYPE,
t.id
),
CTE_COMPONENT AS (
SELECT
c.id,
c.name,
c.typeid
FROM
component c
)
SELECT
JSON_AGG(jsonb_build_object ('id', CT.id, 'name', CT.name, 'type', CT.type, 'component', CC))
FROM
CTE_COMPONENTTYPES CT
INNER JOIN CTE_COMPONENT CC ON CT.id = CC.tradingplancomponenttypeid
GROUP BY
CT.type
I get 2 types from the query as I expected but the components are not grouped together
Can you also point to resources to learn advanced SQL queries?
Here after is a solution to get your expected result as specified in your question :
First part
The first part of the query aggregates all the components with the same TypeId into a jsonb array. It also calculates the new type column with the value 'Others' for all the type names different from General or with the value 'General' :
SELECT CASE WHEN t.name <> 'General' THEN 'Others' ELSE 'General' END AS type
, t.id, t.name
, jsonb_build_object('id', t.id, 'name', t.name, 'component', jsonb_agg(jsonb_build_object('id', c.id, 'name', c.name, 'componentTypeId', c.typeid))) AS list
FROM component AS c
INNER JOIN type AS t
ON t.id = c.typeid
GROUP BY t.id, t.name
jsonb_build_object builds a jsonb object from a set of key/value arguments
jsonb_agg aggregates jsonb objects into a single jsonb array.
Second part
The second part of the query is much more complex because of the structure of your expected result where you want to nest the types which are different from General with their components inside each other according to the TypeId order, ie Low type with TypeId = 3 is nested inside Mostly Used type with TypeId = 2 :
{ "id": "2",
, "name": "Mostly Used"
, "component": [ { "id": 4
, "name": "component 4"
, "componentTypeId": 2
}
, { ... }
, { "id": "3"
, "name": "Low" --> 'Low' type is nested inside 'Mostly Used' type
, "component": [ { "id": 7
, "name": "component 5"
, "componentTypeId": 3
}
, { ... }
]
}
]
}
To do such a nested structure with a random number of TypeId, you could create a recursive query, but I prefer here to create a user-defined aggregate function which will make the query much more simple and readable, see the manual. The aggregate function jsonb_set_inv_agg is based on the user-defined function jsonb_set_inv which inserts the jsonb object x inside the existing jsonb object z according to the path p. This function is based on the jsonb_set standard function :
CREATE OR REPLACE FUNCTION jsonb_set_inv(x jsonb, p text[], z jsonb, b boolean)
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
$$
SELECT jsonb_set(z, p, COALESCE(z#>p || x, z#>p), b) ;
$$ ;
CREATE AGGREGATE jsonb_set_inv_agg(p text[], z jsonb, b boolean)
( sfunc = jsonb_set_inv
, stype = jsonb
) ;
Based on the newly created aggregate function jsonb_set_inv_agg and the jsonb_agg and jsonb_build_object standard functions already seen above, the final query is :
SELECT jsonb_agg(jsonb_build_object('General', x.list)) FILTER (WHERE x.type = 'General')
|| jsonb_build_object('Others', jsonb_set_inv_agg('{component}', x.list, true ORDER BY x.id DESC) FILTER (WHERE x.type = 'Others'))
FROM
( SELECT CASE WHEN t.name <> 'General' THEN 'Others' ELSE 'General' END AS type
, t.id, t.name
, jsonb_build_object('id', t.id, 'name', t.name, 'component', jsonb_agg(jsonb_build_object('id', c.id, 'name', c.name, 'componentTypeId', c.typeid))) AS list
FROM component AS c
INNER JOIN type AS t
ON t.id = c.typeid
GROUP BY t.id, t.name
) AS x
see the full test result in dbfiddle.

Joining tables with filter condition on multiple columns in Oracle DBMS

{
"description": "test",
"id": "1",
"name": "test",
"prod": [
{
"id": "1",
"name": "name",
"re": [
{
"name": "name1",
"value": "1"
},
{
"name": "name2",
"value": "1"
},
{
"name": "name3",
"value": "0"
},
{
"name": "name4",
"value": "0"
}
]
}
]
}
Here is the best I can do with your JSON input and your sample output.
Note that your document has a unique "id" and "name" ("1" and "test" in your example). Then it has an array named "productSpecificationRelationship". Each element of this array is an object with its own "id" - in the query, I show this id with the column name PSR_ID (PSR for Product Specification Relationship). Also, each object in this first-level array contains a sub-array (second level), with objects with "name" ("name" again!) and "value" keys. (This looks very much like an entity-attribute-value model - very poor practice.) In the intermediate step in my query (before pivoting), I call these RC_NAME and RC_VALUE (RC for Relationship Characteristic).
In your sample output you have more than one value in the ID and NAME columns. I don't see how that is possible; perhaps from unpacking more than one document? The JSON document you shared with us has "id" and "name" as top-level attributes.
In the output, I understand (or rather, assume, since I didn't understand too much from your question) that you should also include the PSR_ID - there is only one in your document, with value "10499", but in principle there may be more than one, and the output will have one row per such id.
Also, I assume the "name" values are limited to the four you mentioned (or, if there can be more, you are only interested in those four in the output).
With all that said, here is the query. Note that I called the table ES for simplicity. Also, you will see that I had to go to nested path twice (since your document includes an array of arrays, and I wanted to pick up the PSR_ID from the outer array and the tokens from the nested arrays).
TABLE SETUP
create table es (payloadentityspecification clob
check (payloadentityspecification is json) );
insert into es (payloadentityspecification) values (
'{
"description": "test",
"id": "1",
"name": "test",
"productSpecificationRelationship": [
{
"id": "10499",
"relationshipType": "channelRelation",
"relationshipCharacteristic": [
{
"name": "out_of_home",
"value": "1"
},
{
"name": "out_of_home_ios",
"value": "1"
},
{
"name": "out_of_home_android",
"value": "0"
},
{
"name": "out_of_home_web",
"value": "0"
}
]
}
]
}');
commit;
QUERY
with
prep (id, name, psr_id, rc_name, rc_value) as (
select id, name, psr_id, rc_name, rc_value
from es,
json_table(payloadentityspecification, '$'
columns (
id varchar2(10) path '$.id',
name varchar2(40) path '$.name',
nested path '$.productSpecificationRelationship[*]'
columns (
psr_id varchar2(10) path '$.id',
nested path '$.relationshipCharacteristic[*]'
columns (
rc_name varchar2(50) path '$.name',
rc_value varchar2(50) path '$.value'
)
)
)
)
)
select id, name, psr_id, ooh, ooh_android, ooh_ios, ooh_web
from prep
pivot ( min(case rc_value when '1' then 'TRUE'
when '0' then 'FALSE' else 'UNDEFINED' end)
for rc_name in ( 'out_of_home' as ooh,
'out_of_home_android' as ooh_android,
'out_of_home_ios' as ooh_ios,
'out_of_home_web' as ooh_web
)
)
;
OUTPUT
ID NAME PSR_ID OOH OOH_ANDROID OOH_IOS OOH_WEB
-- ---- ------ ----------- ----------- ----------- -----------
1 test 10499 TRUE FALSE TRUE FALSE
Conditional aggregation might be used in order to pivot the result set after extracting the values by using JSON_TABLE() and JSON_VALUE() functions such as
SELECT JSON_VALUE(payloadentityspecification, '$.name') AS channel_map_name,
MAX(CASE WHEN name = 'out_of_home' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh,
MAX(CASE WHEN name = 'out_of_home_android' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_android,
MAX(CASE WHEN name = 'out_of_home_ios' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_ios,
MAX(CASE WHEN name = 'out_of_home_web' THEN
DECODE(value,1,'TRUE',0,'FALSE','UNDEFINED')
END) AS ooh_web
FROM EntitySpecification ES,
JSON_TABLE (payloadentityspecification, '$.productSpecificationRelationship[*]'
COLUMNS ( NESTED PATH '$.relationshipCharacteristic[*]'
COLUMNS (
description VARCHAR2(250) PATH '$.description',
name VARCHAR2(250) PATH '$.name',
value VARCHAR2(250) PATH '$.value'
)
)) jt
WHERE payloadentityspecification IS JSON
GROUP BY JSON_VALUE(payloadentityspecification, '$.name')
Demo

Bigquery: Append to a nested record

I'm currently checking out Bigquery, and I want to know if it's possible to add new data to a nested table.
For example, if I have a table like this:
[
{
"name": "name",
"type": "STRING"
},
{
"name": "phone",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "number",
"type": "STRING"
},
{
"name": "type",
"type": "STRING"
}
]
}
]
And then I insert a phone number for the contact John Doe.
INSERT into socialdata.phones_examples (name, phone) VALUES("Jonh Doe", [("555555", "Home")]);
Is there an option to later add another number to the contact ? To get something like this:
I know I can update the whole field, but I want to know if there is way to append to the nested table new values.
When you insert data into BigQuery, the granularity is the level of rows, not elements of the arrays contained within rows. You would want to use a query like this, where you update the relevant row and append to the array:
UPDATE socialdata.phones_examples
SET phone = ARRAY_CONCAT(phone, [("555555", "Home")])
WHERE name = "Jonh Doe"
if you need to update multiple records for some users - you can use below
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM (
SELECT 'John Doe' name, STRUCT<number STRING, type STRING>('123-456-7892', 'work') new_phone UNION ALL
SELECT 'Abc Xyz' , STRUCT('123-456-7893', 'work') new_phone
) u
WHERE t.name = u.name
or if those updates are available in some table (for example socialdata.phones_updates):
#standardSQL
UPDATE `socialdata.phones_examples` t
SET phone = ARRAY_CONCAT(phone, [new_phone])
FROM `socialdata.phones_updates` u
WHERE t.name = u.name