Parse Json - CTE & filtering - sql

I need to remove a few records (that contain t) in order to parse/flatten the data column. The query in the CTE that creates 'tab', works independent but when inside the CTE i get the same error while trying to parse json, if I were not have tried to filter out the culprit.
with tab as (
select * from table
where data like '%t%')
select b.value::string, a.* from tab a,
lateral flatten( input => PARSE_JSON( a.data) ) b ;
;
error:
Error parsing JSON: unknown keyword "test123", pos 8
example data:
Date Data
1-12-12 {id: 13-43}
1-12-14 {id: 43-43}
1-11-14 {test12}
1-11-14 {test2}
1-02-14 {id: 44-43}

It is possible to replace PARSE_JSON(a.data) with TRY_PARSE_JSON(a.data) which will produce NULL instead of error for invalid input.
More at: TRY_PARSE_JSON

Related

Extract complex json with random key field

I am trying to extract the following JSON into its own rows like the table below in Presto query. The issue here is the name of the key/av engine name is different for each row, and I am stuck on how I can extract and iterate on the keys without knowing the value of the key.
The json is a value of a table row
{
"Bkav":
{
"detected": false,
"result": null,
},
"Lionic":
{
"detected": true,
"result": Trojan.Generic.3611249',
},
...
AV Engine Name
Detected Virus
Result
Bkav
false
null
Lionic
true
Trojan.Generic.3611249
I have tried to use json_extract following the documentation here https://teradata.github.io/presto/docs/141t/functions/json.html but there is no mention of extraction if we don't know the key :( I am trying to find a solution that works in both presto & hive query, is there a common query that is applicable to both?
You can cast your json to map(varchar, json) and process it with unnest to flatten:
-- sample data
WITH dataset (json_str) AS (
VALUES (
'{"Bkav":{"detected": false,"result": null},"Lionic":{"detected": true,"result": "Trojan.Generic.3611249"}}'
)
)
--query
select k "AV Engine Name", json_extract_scalar(v, '$.detected') "Detected Virus", json_extract_scalar(v, '$.result') "Result"
from (
select cast(json_parse(json_str) as map(varchar, json)) as m
from dataset
)
cross join unnest (map_keys(m), map_values(m)) t(k, v)
Output:
AV Engine Name
Detected Virus
Result
Bkav
false
Lionic
true
Trojan.Generic.3611249
The presto query suggested by #Guru works, but for hive, there is no easy way.
I had to extract the json
Parse it with replace to remove some character and bracket
Then convert it back to a map, and repeat for one more time to get the nested value out
SELECT
av_engine,
str_to_map(regexp_replace(engine_result, '\\}', ''),',', ':') AS output_map
FROM (
SELECT
str_to_map(regexp_replace(regexp_replace(get_json_object(raw_response, '$.scans'), '\"', ''), '\\{',''),'\\},', ':') AS key_val_map
FROM restricted_antispam.abuse_malware_scanning
) AS S
LATERAL VIEW EXPLODE(key_val_map) temp AS av_engine, engine_result

PostgreSQL: select from field with json format

Table has column, named "config" with following content:
{
"A":{
"B":[
{"name":"someName","version":"someVersion"},
{"name":"someName","version":"someVersion"}
]
}
}
The task is to select all name and version values. The output is expected selection with 2 columns: name and value.
I successfully select the content of B:
select config::json -> 'A' -> 'B' as B
from my_table;
But when I'm trying to do something like:
select config::json -> 'A' -> 'B' ->> 'name' as name,
config::json -> 'A' -> 'B' ->> 'version' as version
from my_table;
I receive selection with empty-value columns
If the array size is fixed, you just need to tell which element of the array you want to retrieve,e.g.:
SELECT config->'A'->'B'->0->>'name' AS name,
config->'A'->'B'->0->>'version' AS version
FROM my_table;
But as your array contains multiple elements, use the function jsonb_array_elements in a subquery or CTE and in the outer query parse the each element individually, e.g:
SELECT rec->>'name', rec->>'version'
FROM (SELECT jsonb_array_elements(config->'A'->'B')
FROM my_table) j (rec);
Demo: db<>fiddle
First you should use the jsonb data type instead of json, see the documentation :
In general, most applications should prefer to store JSON data as
jsonb, unless there are quite specialized needs, such as legacy
assumptions about ordering of object keys.
Using jsonb, you can do the following :
SELECT DISTINCT ON (c) c->'name' AS name, c->'version' AS version
FROM my_table
CROSS JOIN LATERAL jsonb_path_query(config :: jsonb, '$.** ? (exists(#.name))') AS c
dbfiddle
select e.value ->> 'name', e.value ->> 'version'
from
my_table cross join json_array_elements(config::json -> 'A' -> 'B') e

Asking for help on correct way to us SQL with CTE to create JSON_OBJECT

The requested JSON needs to be in this form:
{
"header": {
"InstanceName": "US"
},
"erpReferenceData": {
"erpReferences": [
{
"ServiceID": "fb16e421-792b-4e9c-935b-3cea04a84507",
"ERPReferenceID": "J0000755"
},
{
"ServiceID": "7d13d907-0932-44c0-ad81-600c9b97b6e5",
"ERPReferenceID": "J0000756"
}
]
}
}
The program that I created looks like this:
dcl-s OutFile sqltype(dbclob_file);
exec sql
With x as (
select json_object(
'InstanceName' : trim(Cntry) ) objHeader
from xmlhdr
where cntry = 'US'),
y as (
select json_object(
'ServiceID' VALUE S.ServiceID,
'ERPReferenceID' VALUE I.RefCod) oOjRef
FROM IMH I
INNER JOIN GUIDS G ON G.REFCOD = I.REFCOD
INNER JOIN SERV S ON S.GUID = G.GUID
WHERE G.XMLTYPE = 'Service')
VALUES (
select json_object('header' : objHeader Format json ,
'erpReferenceData' : json_object(
'erpReferences' VALUE
JSON_ARRAYAGG(
ObjRef Format json)))
from x
LEFT OUTER JOIN y ON 1=1
Group by objHeader)
INTO :OutFile;
This is the compile error I get:
SQL0122: Position 41 Column OBJHEADER or expression in SELECT list not valid.
I am asking if this is the correct way to create this SQL statement, is there a better easier way? Any idea how to rewrite the SQL statement to make it work correctly?
The key with generating JSON or XML for that matter is to start from the inside and work your way out.
(I've simplified the raw data into just a test table...)
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
select * from elm;
Now add the next layer as a CTE the builds on the first CTE..
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
select * from arr;
And the next layer...
with elm as(select json_object
('ServiceID' VALUE ServiceID,
'ERPReferenceID' VALUE RefCod) as erpRef
from jsontst)
, arr (arrDta) as (values json_array (select erpRef from elm))
, erpReferences (refs) as ( select json_object
('erpReferences' value arrDta )
from arr)
select *
from erpReferences;
Nice thing about building with CTE's is at each step, you can see the results so far...
You can actually always go back and stick a Select * from CTE; in the middle to see what you have at some point.
Note that I'm building this in Run SQL Scripts. Once you have the statement complete, you can embed it in your RPG program.

PG::InvalidParameterValue: ERROR: cannot extract element from a scalar

I'm fetching data from the JSON column by using the following query.
SELECT id FROM ( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json) AS js2 FROM file_table WHERE ('shipment_lot') = ('shipment_lot') ) q WHERE js2->> 'invoice_number' LIKE ('%" abc1123"%')
My Postgresql version is 9.3
Saved data in JSON column:
[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]
However, I'm getting this error:
ActiveRecord::StatementInvalid (PG::InvalidParameterValue: ERROR: cannot extract element from a scalar:
SELECT id FROM
( SELECT id,JSON_ARRAY_ELEMENTS(shipment_lot::json)
AS js2 FROM file_heaps
WHERE ('shipment_lot') = ('shipment_lot') ) q
WHERE js2->> 'invoice_number' LIKE ('%abc1123%'))
How I can solve this issue.
Your issue is that you have improper JSON stored
If you try running your example data on postgres it will not run
SELECT ('[{ "id"=>2981, "lot_number"=>1, "activate"=>true, "invoice_number"=>"abc1123", "price"=>378.0}]')::json
This is the JSON formatted correctly:
SELECT ('[{ "id":2981, "lot_number":1, "activate":true, "invoice_number":"abc1123", "price":378.0}]')::json

Select rows from table with jsonb column based on arbitrary jsonb filter expression

Test data
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3}')
, ('{"a":11,"b":12, "c":13}')
, ('{"a":21,"b":22, "c":23}')
Problem statement: I want to receive an arbitrary JSONB parameter which acts as a filter on column t.data, such as
{ "b":{ "from":0, "to":20 }, "c":13 }
and use this to select matching rows from my test table t.
In this example, I want rows where b is between 0 and 20 and c = 13.
No error is required if the filter specifies a "column" (or "tag") which does not exist in t.data - it just fails to find a match.
I've used numeric values for simplicity but would like an approach which generalises to text as well.
What I have tried so far. I looked at the containment approach, which works for equality conditions, but am stumped on a generic way of handling range conditions:
select * from t
where t.data#> '{"c":13}'::jsonb;
Background: This problem arose when building a generic table-preview page on a website (for Admin users).
The page displays a filter based on various columns in whichever table is selected for preview.
The filter is then passed to a function in Postgres DB which applies this dynamic filter condition to the table.
It returns a jsonb array of the rows matching the filter specified by the user.
This jsonb array is then used to populate the Preview resultset.
The columns which make up the filter may change.
My Postgres version is 9.6 - thanks.
if you want to parse { "b":{ "from":0, "to":20 }, "c":13 } you need a parser. It is out of scope of json functions, but you can write "generic" query using AND and OR to filter by such json, eg:
https://www.db-fiddle.com/f/jAPBQggG3p7CxqbKLMbPKw/0
with filt(f) as (values('{ "b":{ "from":0, "to":20 }, "c":13 }'::json))
select *
from t
join filt on
(f->'b'->>'from')::int < (data->>'b')::int
and
(f->'b'->>'to')::int > (data->>'b')::int
and
(data->>'c')::int = (f->>'c')::int
;
Thanks for the comments/suggestions.
I will definitely look at GraphQL when I have more time - I'm working under a tight deadline at the moment.
It seems the consensus is that a fully generic solution is not achievable without a parser.
However, I got a workable first draft - it's far from ideal but we can work with it. Any comments/improvements are welcome ...
Test data (expanded to include dates & text fields)
DROP TABLE t;
CREATE TABLE t(_id serial PRIMARY KEY, data jsonb);
INSERT INTO t(data) VALUES
('{"a":1,"b":2, "c":3, "d":"2018-03-10", "e":"2018-03-10", "f":"Blah blah" }')
, ('{"a":11,"b":12, "c":13, "d":"2018-03-14", "e":"2018-03-14", "f":"Howzat!"}')
, ('{"a":21,"b":22, "c":23, "d":"2018-03-14", "e":"2018-03-14", "f":"Blah blah"}')
First draft of code to apply a jsonb filter dynamically, but with restrictions on what syntax is supported.
Also, it just fails silently if the syntax supplied does not match what it expects.
Timestamp handling a bit kludgey, too.
-- Handle timestamp & text types as well as int
-- See is_timestamp(text) function at bottom
with cte as (
select t.data, f.filt, fk.key
from t
, ( values ('{ "a":11, "b":{ "from":0, "to":20 }, "c":13, "d":"2018-03-14", "e":{ "from":"2018-03-11", "to": "2018-03-14" }, "f":"Howzat!" }'::jsonb ) ) as f(filt) -- equiv to cross join
, lateral (select * from jsonb_each(f.filt)) as fk
)
select data, filt --, key, jsonb_typeof(filt->key), jsonb_typeof(filt->key->'from'), is_timestamp((filt->key)::text), is_timestamp((filt->key->'from')::text)
from cte
where
case when (filt->key->>'from') is null then
case jsonb_typeof(filt->key)
when 'number' then (data->>key)::numeric = (filt->>key)::numeric
when 'string' then
case is_timestamp( (filt->key)::text )
when true then (data->>key)::timestamp = (filt->>key)::timestamp
else (data->>key)::text = (filt->>key)::text
end
when 'boolean' then (data->>key)::boolean = (filt->>key)::boolean
else false
end
else
case jsonb_typeof(filt->key->'from')
when 'number' then (data->>key)::numeric between (filt->key->>'from')::numeric and (filt->key->>'to')::numeric
when 'string' then
case is_timestamp( (filt->key->'from')::text )
when true then (data->>key)::timestamp between (filt->key->>'from')::timestamp and (filt->key->>'to')::timestamp
else (data->>key)::text between (filt->key->>'from')::text and (filt->key->>'to')::text
end
when 'boolean' then false
else false
end
end
group by data, filt
having count(*) = ( select count(distinct key) from cte ) -- must match on all filter elements
;
create or replace function is_timestamp(s text) returns boolean as $$
begin
perform s::timestamp;
return true;
exception when others then
return false;
end;
$$ strict language plpgsql immutable;