PostgreSQL regexp_replace square brackets to other format - sql

I have this column text in a table which contains following string
{
"person": {
"id": "b01d9bf1-998f-4fa8-879a-0f8d0de4b626",
"creationDate": [
2022,
1,
22
],
"modificationDate": [
2022,
1,
27
]
}
}
I have the following regexp_matches query:
select regexp_matches('"creationDate": [2022,1,22], "modificationDate": [2022,1,27],', '\[(.[^)]+)\]', 'g')
but I need to replace
"creationDate": [2022,1,22], "modificationDate": [2022,1,27],
to
"creationDate": "2022-01-22", "modificationDate": "2022-01-27",
I'm not very good working with regular expressions. Also the difficulty is in adding a leading zero to the month as you can see.

Regex-based
A nested regex replacement does the trick:
select regexp_replace(
regexp_replace(
'"creationDate": [2022,1,22], "modificationDate": [2022,1,27],'
, '\[(\d+),(\d+),(\d+)\]'
, '"\1-\2-\3"'
, 'g'
)
, '-(\d)-'
, '-0\1-'
, 'g');
The outer replacement only fires if the month is represented by a single digit.
JSON-based
Dwelling on the comment by #a_horse_with_no_name, the following query operates uses json operators:
select x.key
, (x.value ->> 0) || '-' || LPAD(x.value ->> 1, 2, '0') || '-' || LPAD(x.value ->> 2, 2, '0') mydate
from json_each ( '{"creationDate": [2022,1,22], "modificationDate": [2022,1,27] }'::json ) x
;
The query builds a set of records from a JSON object consisting of a key (the JSON property name) and a value of the native JSON datatype, which happens to be an array. The array elements are extracted, padded with leading zeros where appropriate and concatenated.
See the Postgresql docs for JSON operators and functions for more info.
Full-fledged example
Query to produce a recordset of persons containing their id plus the creation and modification date based on a json array of objects as given in the question.
select id
, ("creationDate" ->> 0) || '-' || LPAD("creationDate" ->> 1, 2, '0') || '-' || LPAD("creationDate" ->> 2, 2, '0') creation_date
, ("modificationDate" ->> 0) || '-' || LPAD("modificationDate" ->> 1, 2, '0') || '-' || LPAD("modificationDate" ->> 2, 2, '0') modification_date
from jsonb_to_recordset (
(
select jsonb_path_query_array ( orig.j, '$.person' ) part
from (
select '[
{ "person": { "id": "b01d9bf1-998f-4fa8-879a-0f8d0de4b626", "creationDate": [2022,1,22], "modificationDate": [2022,1,27] } }
, { "person": { "id": "deadcafe-998f-4fa8-879a-0f8d0de4b626", "creationDate": [2000,1,1], "modificationDate": [2000,12,31] } }
]'::jsonb j
) orig
)
) as x( id varchar(50), "creationDate" json, "modificationDate" json )
;
Available live here (dbfiddle.co.uk).

Related

Postgres search JSON by dynamic value

In Postgres 14, I'm trying to query a JSON array element:
{
"haystack": [
{ "search": "findthis" },
{ "search": "someothervalue" }
]
}
This works:
SELECT 1
FROM test
WHERE data->'haystack' #> '[{"search":"findthis"}]';
However, when "findthis" comes from a function: getText() or some other dynamic value, I get 0 results:
SELECT 1
FROM test
WHERE data->'haystack' #> to_jsonb('[{"search":"' || getText()::text || '"}]');
(I am expecting to return 1)
My test:
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ed2615fe9be7a5b0284065f49fddd36f
What am I missing?
Use jsonb, not to_jsonb
select to_jsonb('[{"search":"' || 'findthis' || '"}]'), jsonb('[{"search":"' || 'findthis' || '"}]')
and your query with type cast
SELECT 1
FROM test
WHERE data->'haystack' #> ('[{"search":"' || 'findthis' || '"}]')::jsonb;

SQL Generate JSON string using FOR JSON PATH with 2 ROOTs

I have a very simple table containing 5 columns and the table will only hold 1 record at a time. I'm to generate a JSON string from the record and send it to an endpoint.
This is how the JSON string are to be formatted. As you can see it contains 2 'roots' and this is giving me a hard time getting the correct format
{
"fields": [
{
"fieldName": "Brand",
"values": [
"FORD"
]
},
{
"fieldName": "Engine",
"values": [
"V12"
]
},
{
"fieldName": "Location",
"values": [
"Monaco"
]
}
],
"categories": [
{
"fieldName": "Colour",
"values": [
[
{
"name": "Blue"
}
]
]
},
{
"fieldName": "Interior",
"values": [
[
{
"name": "Red"
}
]
]
}
]
}
This is my table containing the 5 columns
I have managed to create 2 separate SQL queries to get the JSON string. But I can't figure out how do it in one select.
SELECT (
SELECT X.* FROM (
SELECT CASE WHEN CarName IS NOT NULL THEN 'Brand' ELSE NULL END AS fieldName,
CarName AS [value]
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN Engine IS NOT NULL THEN 'Engine' ELSE NULL END AS fieldName,
Engine AS [value]
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN [location] IS NOT NULL THEN 'Location' ELSE NULL END AS fieldName,
[Location] AS [value]
FROM [dbo].[JSONBODY] ) X
FOR JSON PATH, ROOT('fields'))
SELECT (
SELECT Y.* FROM (
SELECT CASE WHEN Colour IS NOT NULL THEN 'Colour' ELSE NULL END AS fieldName,
JSON_QUERY('[["' + Colour + '"]]') AS 'value.name'
FROM [dbo].[JSONBODY]
UNION
SELECT CASE WHEN Interior IS NOT NULL THEN 'Interior' ELSE NULL END AS fieldName,
JSON_QUERY('[["' + Interior + '"]]') AS 'value.name'
FROM [dbo].[JSONBODY]) Y
FOR JSON PATH, ROOT('categories'))
And here are the 2 JSON strings:
{"fields":[{"fieldName":"Brand","value":"Ford"},{"fieldName":"Engine","value":"V6"},{"fieldName":"Location","value":"Boston"}]}
{"categories":[{"fieldName":"Colour","value":{"name":[["Blue"]]}},{"fieldName":"Interior","value":{"name":[["Black"]]}}]}
Question 1:
Is it possible to create the JSON string through a single SQL Select? And how can I do it?
Question 2:
If a column value is NULL it is excluded automatically from the JSON string. But I had to add the fieldName to the select and had hoped it would have exclude it from the JSON string if the corresponding field was NULL. However it creates a {}, in the JSON string. And this is not accepted when calling the endpoint. So is there another way to do it when a column value is NULL? I can of course delete it from the JSON string afterwards....
Hope the above makes sense
To do it as a single SELECT you can just UNION ALL the two results together
You can unpivot the values, then check them afterwards for nulls.
Unfortunately, SQL Server does not have JSON_AGG, so you have to bodge it with STRING_AGG and STRING_ESCAPE
SELECT
v.fieldName,
value = JSON_QUERY('[' + STRING_AGG('"' + STRING_ESCAPE(v.value, 'json') + '"', ',') + ']')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Brand', jb.Brand),
('Engine', jb.Engine),
('Location', jb.Location)
) v(fieldName, value)
GROUP BY
v.fieldName
FOR JSON PATH, ROOT('fields');
UNION ALL
SELECT
v.fieldName,
[value.name] = JSON_QUERY('[[' + STRING_AGG('"' + STRING_ESCAPE(v.value, 'json') + '"', ',') + ']]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Colour', jb.Colour),
('Interior', jb.Interior)
) v(fieldName, value)
GROUP BY
v.fieldName
FOR JSON PATH, ROOT('categories');
If you know you will only ever have one row, you can simplify it by removing the GROUP BY
SELECT (
SELECT
v.fieldName,
value = JSON_QUERY('["' + STRING_ESCAPE(v.value, 'json') + '"]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Brand', jb.Brand),
('Engine', jb.Engine),
('Location', jb.Location)
) v(fieldName, value)
WHERE v.value IS NOT NULL
FOR JSON PATH, ROOT('fields')
)
UNION ALL
SELECT (
SELECT
v.fieldName,
[value.name] = JSON_QUERY('[["' + STRING_ESCAPE(v.value, 'json') + '"]]')
FROM [dbo].[JSONBODY] jb
CROSS APPLY (VALUES
('Colour', jb.Colour),
('Interior', jb.Interior)
) v(fieldName, value)
WHERE v.value IS NOT NULL
FOR JSON PATH, ROOT('categories')
);
db<>fiddle

How to PARSE_JSON from Snowflake Field?

I have a table that looks like:
ID|FIELD1
1|[ { "list": [ {} ] } ]
2|[ { "list": [ { "item": "" } ] } ]
3|[ { "list": [ { "item": "Tag1" }, { "item": "Tag2" } ] } ]
And I want to get all the tags associated to this specific query such that I can just get a list:
Tag1,Tag2
I've tried
SELECT PARSE_JSON(FIELD1[0]['list'][0]['item']) FROM MY_TABLE
WHERE PARSE_JSON(FIELD1[0]['list'][0]) != '{}'
But I get
JSON: garbage in the numeric literal: 65-310 , pos 7
How can I properly unpack these values in SQL?
UPDATE: Clumsy Solution
SELECT LISTAGG(CODES,'\',\'') AS PROMO_CODES
FROM
(SELECT DISTINCT FIELD1[0]['list'][0]['item'] AS CODES FROM MY_TABLE
WHERE FIELD1[0]['list'][0] IS NOT NULL
AND FIELD1[0]['list'][0] != '{}'
AND FIELD1[0]['list'][0]['item'] != ''
)
Please have a look into below knowledge article, if this helps in your case:
https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN
As I see, the Clumsy Solution does not provide the correct result. It shows only Tag1. So here's my solution:
select LISTAGG( v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten (parse_json(FIELD1[0]):list) v
WHERE v.VALUE:item <> '';
I would recommend to add DISTINCT to prevent duplicate tags in the output:
select LISTAGG( DISTINCT v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten (parse_json(FIELD1[0]):list) v
WHERE v.VALUE:item <> '';
If there are more items in the FIELD1 array (ie 0,1,2), you may use this one:
select LISTAGG( DISTINCT v.VALUE:item, ',' ) from MY_TABLE,
lateral flatten(FIELD1) f,
lateral flatten (parse_json(f.VALUE):list) v
WHERE v.VALUE:item <> '';

Db2 nested JSON

I am trying to use Db2 JSON capabilities and in particular nested tables.
CREATE TABLE JSON.TEST1 (COL1 VARBINARY(2000));
INSERT INTO JSON.TEST1 (COL1) VALUES (JSON_TO_BSON(
'{"id" : 103,
"orderDate": "2014-06-20",
"items": {
"item": [ { "partNum": "872-AA",
"productName": "Lawnmower",
"quantity": 1,
"USPrice": 749.99
},
{ "partNum": "837-CM",
"productName": "Digital Camera",
"quantity": 2,
"USPrice": 199.99
}
]
}
}'
));
This works fine, however obviously items in the array are hardcoded references.
SELECT id
,orderDate
,product1
,product2
FROM json.TEST1 AS js,
JSON_TABLE
(js.COL1, 'strict $'
COLUMNS( id INTEGER PATH '$.id'
,orderDate DATE PATH '$.orderDate'
,product1 VARCHAR(32) PATH '$.items.item[0].productName'
,product2 VARCHAR(32) PATH '$.items.item[1].productName'
)
ERROR ON ERROR) AS t
;
The following is what I am trying to get working:
SELECT id
,orderDate
,productName
FROM json.TEST1 AS js,
JSON_TABLE
(js.COL1, '$'
COLUMNS( id INTEGER PATH '$.id'
,orderDate DATE PATH '$.orderDate'
,NESTED 'lax $.items.item[]'
COLUMNS (
"productName" VARCHAR(32)
)
)
) as t;
For reference the error I am receiving
1) [Code: -104, SQL State: 42601] An unexpected token "'lax $.items.item[]'
COLUMNS (
" was found following ",NESTED". Expected tokens may include: "<space>".. SQLCODE=-104, SQLSTATE=42601, DRIVER=4.26.14
2) [Code: -727, SQL State: 56098] An error occurred during implicit system action type "2". Information returned for the error includes SQLCODE "-104", SQLSTATE "42601" and message tokens "'lax $.items.item[]'
COLUMNS (
|,N".. SQLCODE=-727, SQLSTATE=56098, DRIVER=4.26.14
Unfortunately, you must unnest JSON arrays on your own, for example, with Recursive Common Table Expression (RCTE):
-- A table with JSON documents
WITH TAB (DOC_ID, DOC) AS
(
VALUES
(
1,
'{"id" : 103,
"orderDate": "2014-06-20",
"items": {
"item": [ { "partNum": "872-AA",
"productName": "Lawnmower",
"quantity": 1,
"USPrice": 749.99
},
{ "partNum": "837-CM",
"productName": "Digital Camera",
"quantity": 2,
"USPrice": 199.99
}
]
}
}'
)
)
-- get a JSON array only for each record
, ITEMS_ARRAY (DOC_ID, ITEMS) AS
(
SELECT DOC_ID, JSON_OBJECT(KEY 'items' VALUE JSON_QUERY(DOC, '$.items.item') FORMAT JSON)
FROM TAB
)
-- Use RCTE to unnest it
, ITEMS (DOC_ID, INDEX, ITEM) AS
(
SELECT DOC_ID, 0, JSON_QUERY(ITEMS, '$.items[0]')
FROM ITEMS_ARRAY
WHERE JSON_EXISTS(ITEMS, '$.items[0]')
UNION ALL
SELECT I.DOC_ID, I.INDEX+1, JSON_QUERY(A.ITEMS, '$.items['|| TRIM(I.INDEX+1) ||']')
FROM ITEMS I, ITEMS_ARRAY A
WHERE I.DOC_ID = A.DOC_ID AND JSON_EXISTS(A.ITEMS, '$.items['|| TRIM(I.INDEX+1) ||']')
)
SELECT D.*, IT.*
--, I.*
FROM TAB T
JOIN ITEMS I ON I.DOC_ID = T.DOC_ID
-- array element to row
CROSS JOIN JSON_TABLE
(
I.ITEM, 'strict $' COLUMNS
(
PARTNUM VARCHAR(20) PATH '$.partNum'
, PRODCUCTNAME VARCHAR(20) PATH '$.productName'
, QUANTITY INT PATH '$.quantity'
, USPRICE DECFLOAT PATH '$.USPrice'
) ERROR ON ERROR
) IT
-- other elements of original JSON to row
CROSS JOIN JSON_TABLE
(
T.DOC, 'strict $' COLUMNS
(
ID INT PATH '$.id'
, ORDERDATE DATE PATH '$.orderDate'
) ERROR ON ERROR
) D
;
There result is:
|ID |ORDERDATE |PARTNUM|PRODCUCTNAME |QUANTITY|USPRICE|
|---|----------|-------|--------------|--------|-------|
|103|2014-06-20|872-AA |Lawnmower |1 |749.99 |
|103|2014-06-20|837-CM |Digital Camera|2 |199.99 |
db<>fiddle example.
Update
It's convenient to create a generic function suitable for any JSON array:
-- WITH A GENERIC TABLE FUNCTION
CREATE OR REPLACE FUNCTION UNNEST_JSON (P_DOC CLOB(1M), P_PATH VARCHAR(128))
RETURNS TABLE
(
INDEX INT
, ITEM CLOB(1M)
)
RETURN
WITH ITEMS_ARRAY (ITEMS) AS
(
VALUES JSON_OBJECT(KEY 'items' VALUE JSON_QUERY(P_DOC, P_PATH) FORMAT JSON)
)
, ITEMS (INDEX, ITEM) AS
(
SELECT 0, JSON_QUERY(ITEMS, '$.items[0]')
FROM ITEMS_ARRAY
WHERE JSON_EXISTS(ITEMS, '$.items[0]')
UNION ALL
SELECT I.INDEX+1, JSON_QUERY(A.ITEMS, '$.items['|| TRIM(I.INDEX+1) ||']')
FROM ITEMS I, ITEMS_ARRAY A
WHERE JSON_EXISTS(A.ITEMS, '$.items['|| TRIM(I.INDEX+1) ||']')
)
SELECT INDEX, ITEM
FROM ITEMS
#
Such a generic function simplifies the solution:
WITH TAB (DOC_ID, DOC) AS
(
VALUES
(
1,
'{"id" : 103,
"orderDate": "2014-06-20",
"items": {
"item": [ { "partNum": "872-AA",
"productName": "Lawnmower",
"quantity": 1,
"USPrice": 749.99
},
{ "partNum": "837-CM",
"productName": "Digital Camera",
"quantity": 2,
"USPrice": 199.99
}
]
}
}'
)
,
(
2,
'{"id" : 203,
"orderDate": "2014-06-20",
"items": {
"item": [ { "partNum": "002-AA",
"productName": "Lawnmower",
"quantity": 10,
"USPrice": 749.99
},
{ "partNum": "002-BB",
"productName": "Digital Camera",
"quantity": 20,
"USPrice": 199.99
}
]
}
}'
)
)
SELECT T.DOC_ID, A.INDEX, D.*, IT.*
FROM
TAB T
-- unnesting
, TABLE(UNNEST_JSON(T.DOC, '$.items.item')) A
-- array element to row
, JSON_TABLE
(
A.ITEM, 'strict $' COLUMNS
(
PARTNUM VARCHAR(20) PATH '$.partNum'
, PRODCUCTNAME VARCHAR(20) PATH '$.productName'
, QUANTITY INT PATH '$.quantity'
, USPRICE DECFLOAT PATH '$.USPrice'
) ERROR ON ERROR
) IT
-- other elements of original JSON to row
, JSON_TABLE
(
T.DOC, 'strict $' COLUMNS
(
ID INT PATH '$.id'
, ORDERDATE DATE PATH '$.orderDate'
) ERROR ON ERROR
) D;
The result is:
|DOC_ID|INDEX|ID |ORDERDATE |PARTNUM|PRODCUCTNAME |QUANTITY|USPRICE|
|------|-----|---|----------|-------|--------------|--------|-------|
|1 |0 |103|2014-06-20|872-AA |Lawnmower |1 |749.990|
|1 |1 |103|2014-06-20|837-CM |Digital Camera|2 |199.990|
|2 |0 |203|2014-06-20|002-AA |Lawnmower |10 |749.990|
|2 |1 |203|2014-06-20|002-BB |Digital Camera|20 |199.990|
A couple of UDFs with the same functionality which should work faster, since they don't use RCTE.
The example of use is in my another older answer here.
-- Uses XML, should work in all environments
CREATE OR REPLACE FUNCTION UNNEST_JSON2 (P_DOC CLOB(1M), P_PATH VARCHAR(128))
RETURNS TABLE
(
INDEX INT
, ITEM CLOB (1M)
)
DETERMINISTIC
NO EXTERNAL ACTION
BEGIN ATOMIC
DECLARE L_IDX INT DEFAULT 0;
DECLARE L_XML XML;
L1:
WHILE TRUE DO
IF NOT JSON_EXISTS (P_DOC, P_PATH || '[' || L_IDX || ']') THEN LEAVE L1; END IF;
SET (L_XML, L_IDX) =
(
XMLCONCAT (L_XML, XMLELEMENT (NAME "A", JSON_QUERY (P_DOC, P_PATH || '[' || L_IDX || ']')))
, L_IDX + 1
);
END WHILE L1;
RETURN
SELECT SEQ - 1, T.ITEM
FROM XMLTABLE
(
'$D' PASSING L_XML AS "D"
COLUMNS
SEQ FOR ORDINALITY
, ITEM CLOB (1M) PATH '.'
) T
WHERE L_XML IS NOT NULL;
END
#
-- Doesn't work in DPF environment, but should be the fastest one
CREATE OR REPLACE FUNCTION UNNEST_JSON3 (P_DOC CLOB(1M), P_PATH VARCHAR(128))
RETURNS TABLE
(
INDEX INT
, ITEM CLOB (1M)
)
DETERMINISTIC
NO EXTERNAL ACTION
BEGIN
DECLARE L_IDX INT DEFAULT 0;
L1:
WHILE TRUE DO
IF NOT JSON_EXISTS (P_DOC, P_PATH || '[' || L_IDX || ']') THEN LEAVE L1; END IF;
PIPE (L_IDX, JSON_QUERY (P_DOC, P_PATH || '[' || L_IDX || ']'));
SET L_IDX = L_IDX + 1;
END WHILE L1;
RETURN;
END
#

Oracle - json data contains array/string interchangeably

I have json data and I am trying to put that data in different columns in oracle.
Issue is one of the column sometimes contains an array and sometimes contain string.
I know there is different command to put json array to column but if the column is populated with string sometimes and array sometimes, how do I write sql so that it fetch all data -
SELECT id,array1
FROM (
select '{
"data": [
{
"id": 1,
"array1": [ "INFO", "ABC", ]
},
{
"id": 2,
"array1": "TEST",
}
]
}' AS JSON_DATA
FROM DUAL
) I,
json_table(
i.JSON_DATA ,
'$.data[*]'
COLUMNS (
array1 varchar2(4000) FORMAT JSON path'$."array1"',
ID varchar2(4000) path '$."id"'
)
) a
Output from the sql:
ID ARRAY1
1 ["INFO","ABC"]
2
Desired Ouput :
ID ARRAY1
1 ["INFO","ABC"]
2 TEST
array1 varchar2(4000) PATH '$."array1"' can be considered together with
array1 varchar2(4000) FORMAT JSON PATH '$."array1"'
Since both case exists for values of array1 key. So, use :
SELECT ID, NVL(array1, array1_) AS array1
FROM
(
SELECT '{
"data": [
{
"id": 1,
"array1": [ "INFO", "ABC" ]
},
{
"id": 2,
"array1": "TEST"
}
]
}' AS JSON_DATA
FROM DUAL
) I
CROSS JOIN
JSON_TABLE(
i.JSON_DATA ,
'$.data[*]'
COLUMNS (
array1 varchar2(4000) PATH '$."array1"',
array1_ varchar2(4000) FORMAT JSON PATH '$."array1"',
ID varchar2(4000) PATH '$."id"'
)
) A
Demo