Replacing substring with variables in SQL - sql

I am currently figuring out how to do a bit more complex data migration in my database and whether it is even possible to do in SQL (not very experienced SQL developer myself).
Let's say that I store JSONs in one of my text columns in a Postgres table wtih roughly the following format:
{"type":"something","params":[{"value":"00de1be5-f75b-4072-ba30-c67e4fdf2333"}]}
Now, I would like to migrate the value part to a bit more complex format:
{"type":"something","params":[{"value":{"id":"00de1be5-f75b-4072-ba30-c67e4fdf2333","path":"/hardcoded/string"}}]}
Furthermore, I also need to reason whether the value contains a UUID pattern, and if not, use slightly different structure:
{"type":"something-else","params":[{"value":"not-id"}]} ---> {"type":"something-else","params":[{"value":{"value":"not-id","path":""}}]}
I know I can define a procedure and use REGEX_REPLACE: REGEXP_REPLACE(source, pattern, replacement_string,[, flags]) but I have no idea how to approach the reasoning about whether the content contains ID or not. Could someone suggest at least some direction or hint how to do this?

You can use jsonb function for extract data and change them. At the end you should extend data.
Sample data structure and query result: dbfiddle
select
(t.data::jsonb || jsonb_build_object(
'params',
jsonb_agg(
jsonb_build_object(
'value',
case
when e.value->>'value' ~* '^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$' then
jsonb_build_object('id', e.value->>'value', 'path', '/hardcoded/string')
else
jsonb_build_object('value', 'not-id', 'path', '')
end
)
)
))::text
from
test t
cross join jsonb_array_elements(t.data::jsonb->'params') e
group by t.data
PS:
If your table had id or unique field you can change group by t.data to do things like that:
select
(t.data::jsonb || jsonb_build_object(
'params',
jsonb_agg(
jsonb_build_object(
'value',
case
when e.value->>'value' ~* '^[0-9A-F]{8}-[0-9A-F]{4}-4[0-9A-F]{3}-[89AB][0-9A-F]{3}-[0-9A-F]{12}$' then
jsonb_build_object('id', e.value->>'value', 'path', '/hardcoded/string')
else
jsonb_build_object('value', 'not-id', 'path', '')
end
)
)
))::text
from
test t
cross join jsonb_array_elements(t.data::jsonb->'params') e
group by t.id

To replace values at any depth, you can use a recursive CTE to run replacements for each value of a value key, using a conditional to check if the value is a UUID, and producing the proper JSON object accordingly:
with recursive cte(v, i, js) as (
select (select array_to_json(array_agg(distinct t.i))
from (select (regexp_matches(js, '"value":("[\w\-]+")', 'g'))[1] i) t), 0, js from (select '{"type":"something","params":[{"value":"00de1be5-f75b-4072-ba30-c67e4fdf2333"}, {"value":"sdfsa"}]}' js) t1
union all
select c.v, c.i+1, regexp_replace(
regexp_replace(c.js, regexp_replace((c.v -> c.i)::text, '[\\"]+', '', 'g'),
case when not ((c.v -> c.i)::text ~ '\w+\-\w+\-\w+\-\w+\-\w+') then
json_build_object('value', regexp_replace((c.v -> c.i)::text, '[\\"]+', '', 'g'), 'path', '')::text
else json_build_object('id', regexp_replace((c.v -> c.i)::text, '[\\"]+', '', 'g'), 'path', '/hardcoded/path')::text end, 'g'),
'(")(?=\{)|(?<=\})(")', '', 'g')
from cte c where c.i < json_array_length(c.v)
)
select js from cte order by i desc limit 1
Output:
{"type":"something","params":[{"value":{"id" : "00de1be5-f75b-4072-ba30-c67e4fdf2333", "path" : "/hardcoded/path"}}, {"value":{"value" : "sdfsa", "path" : ""}}]}
On a more complex JSON input string:
{"type":"something","params":[{"value":"00de1be5-f75b-4072-ba30-c67e4fdf2333"}, {"value":"sdfsa"}, {"more":[{"additional":[{"value":"00f41be5-g75b-4072-ba30-c67e4fdf3777"}]}]}]}
Output:
{"type":"something","params":[{"value":{"id" : "00de1be5-f75b-4072-ba30-c67e4fdf2333", "path" : "/hardcoded/path"}}, {"value":{"value" : "sdfsa", "path" : ""}}, {"more":[{"additional":[{"value":{"id" : "00f41be5-g75b-4072-ba30-c67e4fdf3777", "path" : "/hardcoded/path"}}]}]}]}

Related

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

PostgreSQL: select from field with json format

Table has column, named "config" with following content:
{
"A":{
"B":[
{"name":"someName","version":"someVersion"},
{"name":"someName","version":"someVersion"}
]
}
}
The task is to select all name and version values. The output is expected selection with 2 columns: name and value.
I successfully select the content of B:
select config::json -> 'A' -> 'B' as B
from my_table;
But when I'm trying to do something like:
select config::json -> 'A' -> 'B' ->> 'name' as name,
config::json -> 'A' -> 'B' ->> 'version' as version
from my_table;
I receive selection with empty-value columns
If the array size is fixed, you just need to tell which element of the array you want to retrieve,e.g.:
SELECT config->'A'->'B'->0->>'name' AS name,
config->'A'->'B'->0->>'version' AS version
FROM my_table;
But as your array contains multiple elements, use the function jsonb_array_elements in a subquery or CTE and in the outer query parse the each element individually, e.g:
SELECT rec->>'name', rec->>'version'
FROM (SELECT jsonb_array_elements(config->'A'->'B')
FROM my_table) j (rec);
Demo: db<>fiddle
First you should use the jsonb data type instead of json, see the documentation :
In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.
Using jsonb, you can do the following :
SELECT DISTINCT ON (c) c->'name' AS name, c->'version' AS version
FROM my_table
CROSS JOIN LATERAL jsonb_path_query(config :: jsonb, '$.** ? (exists(#.name))') AS c
dbfiddle
select e.value ->> 'name', e.value ->> 'version'
from
my_table cross join json_array_elements(config::json -> 'A' -> 'B') e

Select rows where nested json array field includes any of values from a provided array in PostgreSQL?

I'm trying to write an sql query that would find the rows in a table that match any value of the provided json array.
To put it more concretely, I have the following db table:
CREATE TABLE mytable (
name text,
id SERIAL PRIMARY KEY,
config json,
matching boolean
);
INSERT INTO "mytable"(
"name", "id", "config", "matching"
)
VALUES
(
E 'Name 1', 50,
E '{"employees":[1,7],"industries":["1","3","4","13","14","16"],"levels":["1110","1111","1112","1113","1114"],"revenue":[0,5],"states":["AK","Al","AR","AZ","CA","CO","CT","DC","DE","FL","GA","HI","IA","ID","IL"]}',
TRUE
),
(
E 'Name 2', 63,
E '{"employees":[3,5],"industries":["1"],"levels":["1110"],"revenue":[2,5],"states":["AK","AZ","CA","CO","HI","ID","MT","NM","NV","OR","UT","WA","WY"]}',
TRUE,
),
(
E 'Name 3', 56,
E '{"employees":[0,0],"industries":["14"],"levels":["1111"],"revenue":[7,7],"states":["AK","AZ","CA","CO","HI","ID","MT","NM","NV","OR","UT","WA","WY"]}',
TRUE,
),
(
E 'Name 4', 61,
E '{"employees":[3,8],"industries":["1"],"levels":["1110"],"revenue":[0,5],"states":["AK","AZ","CA","CO","HI","ID","WA","WY"]}',
FALSE
);
I need to perform search queries on this table with the given filtering params. The filtering params basically correspond to the json keys in config field. They come from the client side and can look something like this:
{"employees": [1, 8], "industries": ["12", "5"]}
{"states": ["LA", "WA", "CA"], "levels": ["1100", "1100"], "employees": [3]}
And given such filters, I need to find the rows in my table that include any of the array elements from the corresponding filter key for every filter key provided.
So given the filter {"employees": [1, 8], "industries": ["12", "5"]} the query would have to return all the rows where (employees key in config field contains either 1 or 8 AND where industries key in config field contains either 12 or 5);
I need to generate such a query dynamically from the javascript code so that I could include/exclude filtering by a certain parameter bu adding/removing the AND operator.
What I have so far is a super long-running query that generates all possible combinations of array elements in config field which feels very wrong:
select * from mytable
cross join lateral json_array_elements(config->'employees') as e1
cross join lateral json_array_elements(config->'states') as e2
cross join lateral json_array_elements(config->'levels') as e3
cross join lateral json_array_elements(config->'revenue') as e4;
I've also tried to do something like this:
select * from mytable
where
matching = TRUE
and (config->'employees')::jsonb #> ANY(ARRAY ['[1, 7, 8]']::jsonb[])
and (config->'states')::jsonb #> ANY(ARRAY ['["AK", "AZ"]']::jsonb[])
and ........;
however this didn't work, although looked promising.
Also, I've tried playing with ?| operator but to no avail.
Basically, what I need is: given an array key in a json field check if this field contains any of the provided values in another array (which is my filtering parameter); and I have to do this for multiple filtering parameters dynamically.
So the logic is the following:
select all rows from the table
*where*
matching = TRUE
*and* config->key1 includes any of the keys from [5,6,8,7]
*and* config->key2 includes any of the keys from [8,6,2]
*and* so forth;
Could you help me with implementing such an sql query?
Or maybe such sql queries will always be extremely slow and its best to do such filtering outside of the database level?
I'd try with something like that. I guess there are certain side effects (e.g. What if the comparison data is empty?) and I didn't test it on larger data sets... It was just the first which came to my mind... :
demo:db<>fiddle
SELECT
*
FROM
mytable t
JOIN (SELECT '{"states": ["LA", "WA", "CA"], "levels": ["1100", "1100"], "employees": [3]}'::json as data) c
ON
CASE WHEN c.data -> 'employees' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'employees')) && ARRAY(SELECT json_array_elements_text(c.data -> 'employees'))
ELSE TRUE END
AND
CASE WHEN c.data -> 'industries' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'industries')) && ARRAY(SELECT json_array_elements_text(c.data -> 'industries'))
ELSE TRUE END
AND
CASE WHEN c.data -> 'states' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'states')) && ARRAY(SELECT json_array_elements_text(c.data -> 'states'))
ELSE TRUE END
AND
CASE WHEN c.data -> 'revenue' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'revenue')) && ARRAY(SELECT json_array_elements_text(c.data -> 'revenue'))
ELSE TRUE END
AND
CASE WHEN c.data -> 'levels' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'levels')) && ARRAY(SELECT json_array_elements_text(c.data -> 'levels'))
ELSE TRUE END
Explanation of the join condition:
CASE WHEN c.data -> 'levels' IS NOT NULL THEN
ARRAY(SELECT json_array_elements_text(t.config -> 'levels')) && ARRAY(SELECT json_array_elements_text(c.data -> 'levels'))
ELSE TRUE END
If your comparision data does not contain the specific attribute, the condition is true and therefore will be ignored. If it contains an attribute, compare the table and comparision arrays for this attribute by transforming both JSON arrays into simple Postgres arrays

Asking for help on correct way to us SQL with CTE to create JSON_OBJECT

The requested JSON needs to be in this form:
{
"header": {
"InstanceName": "US"
},
"erpReferenceData": {
"erpReferences": [
{
"ServiceID": "fb16e421-792b-4e9c-935b-3cea04a84507",
"ERPReferenceID": "J0000755"
},
{
"ServiceID": "7d13d907-0932-44c0-ad81-600c9b97b6e5",
"ERPReferenceID": "J0000756"
}
]
}
}
The program that I created looks like this:
dcl-s OutFile sqltype(dbclob_file);
exec sql
With x as (
select json_object(
'InstanceName' : trim(Cntry) ) objHeader
from xmlhdr
where cntry = 'US'),
y as (
select json_object(
'ServiceID' VALUE S.ServiceID,
'ERPReferenceID' VALUE I.RefCod) oOjRef
FROM IMH I
INNER JOIN GUIDS G ON G.REFCOD = I.REFCOD
INNER JOIN SERV S ON S.GUID = G.GUID
WHERE G.XMLTYPE = 'Service')
VALUES (
select json_object('header' : objHeader Format json ,
'erpReferenceData' : json_object(
'erpReferences' VALUE
JSON_ARRAYAGG(
ObjRef Format json)))
from x
LEFT OUTER JOIN y ON 1=1
Group by objHeader)
INTO :OutFile;
This is the compile error I get:
SQL0122: Position 41 Column OBJHEADER or expression in SELECT list not valid.
I am asking if this is the correct way to create this SQL statement, is there a better easier way? Any idea how to rewrite the SQL statement to make it work correctly?
The key with generating JSON or XML for that matter is to start from the inside and work your way out.
(I've simplified the raw data into just a test table...)
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
select * from elm;
Now add the next layer as a CTE the builds on the first CTE..
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
select * from arr;
And the next layer...
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
, erpReferences (refs) as ( select json_object
('erpReferences' value arrDta )
from arr)
select *
from erpReferences;
Nice thing about building with CTE's is at each step, you can see the results so far...
You can actually always go back and stick a Select * from CTE; in the middle to see what you have at some point.
Note that I'm building this in Run SQL Scripts. Once you have the statement complete, you can embed it in your RPG program.

SQL Join with operator IN

I'm having a problem joining two tables using IN.
Example:
with nodes(node_id, mpath) as (
SELECT node_id, drugs_cls_node_view.mpath
FROM drugs_cls_entries_view
inner join drugs_cls_node_view on drugs_cls_node_view.id = node_id
WHERE mnn_id in (13575)
)
select DISTINCT n.node_id, drugs_cls_node_view.*
from nodes n
inner join drugs_cls_node_view
on drugs_cls_node_view.id in (array_replace(string_to_array(n.mpath, '/'), '', '0')::bigint[])
I get the exception:
ERROR: operator does not exist: bigint = bigint[]
With
on drugs_cls_node_view.id in
(array_replace(string_to_array(n.mpath, '/'), '', '0')::bigint[])
you look for the ID in a set containing just one element. This element is an array. The ID can never equal the array, hence the error.
You must unnest the array to have single values to compare with:
on drugs_cls_node_view.id in
(select(unnest(array_replace(string_to_array(n.mpath, '/'), '', '0')::bigint[])))
Or use ANY on the array instead of IN:
on drugs_cls_node_view.id = ANY
(array_replace(string_to_array(n.mpath, '/'), '', '0')::bigint[])
There may be syntactical errors in my code, as I am no postgres guy, but it should do with maybe a little correction here or there :-)