Perform Aggregations using JSON1 and SQLite3 in Json Objects - sql

I just started using SQLite 3 with JSON1 support.
I already have created a database that consists of several attributes.
One of these attributes is a json object.
What I want to do is perform aggregations within this object.
Running the following query:
select json_extract(report, '$.some_attribute')
from table1
group by var1
having date == max(date);
returns the following:
[{"count":55,"label":"A"},{"count":99,"label":"B"}, {"count":1,"label":"C"}]
[{"count":29,"label":"A"},{"count":285,"label":"B"},{"count":461,"label":"C"}]
[{"count":6642,"label":"A"},{"count":24859,"label":"B"},{"count":3031,"label":"C"}]
[{"count":489,"label":"A"},{"count":250,"label":"B"},{"count":74,"label":"C"}]
Now, what I want to do is to group by the label key and for example, sum the count key.
The output should be something like this:
[{"label": A, 'count': 7215},
{"label": B, 'count': 25493},
{"label": C, 'count': 3567}]
OR this:
A, B, C
7215, 25493, 3567
I've tried to implement the latter one like this:
select sum(A) as A, sum(B) as B, sum(C) as C
from (
select json_extract(report,
'$.some_attribute[0].count') as A,
json_extract(report,
'$.some_attribute[1].count') as B,
json_extract(report,
'$.some_attribute[0].count') as C
from table1
group by var1
having date == max(date));
The thing is, how can you be sure that all the objects in the array will be sorted the same way. So this may cause problems.
Any solutions? thanks!

If you "un-nest" the json strings returned from the first json_extract,as with json_each, it becomes trivial. This worked in my repro:
WITH result as (select jsonString from jsonPlay)
select json_extract(value,'$.label') label,SUM(json_extract(value,'$.count'))
from result,json_each(jsonString)
group by label
Giving this result:
A| 7215
B| 25493
C| 3567
Basically, your select json_extract(report, '$.some_attribute') block replaces select jsonString from jsonPlay
You could use this to "columnize" it, as in your OR option.
WITH result as (select jsonString from jsonPlay)
select SUM(CASE WHEN json_extract(value,'$.label')='A' then json_extract(value,'$.count') END) 'A',
SUM(CASE WHEN json_extract(value,'$.label')='B' then json_extract(value,'$.count') END) 'B',
SUM(CASE WHEN json_extract(value,'$.label')='C' then json_extract(value,'$.count') END) 'C'
from result,json_each(jsonString)

Related

How to extract a json value inside an array inside a json in SQL

I have a JSON being returned in my query called MetaDataJSON, inside which is an array of JSONs called Results. Inside each JSON in Results are two values, Chronic and Probability. There are a couple other tables that have been joined too. Is there a way to get Chronic in a column by itself? Right now I have gotten this far (table and variable names have been made generic):
SELECT DISTINCT
JSON_QUERY(mdj.value, '$.Results[0]') [Results]
FROM table1 t1
JOIN table2 t2 ON t2.parameter1 = t1.parameter1
AND t2.parameter2 = 'ASDF'
JOIN table3 t3 ON oad.parameter3 = oa.parameter3
AND t3.parameter4 = 11
CROSS APPLY OPENJSON(t3.MetaDataJSON) as mdj
This gets me a column called Results where each entry looks like:
{"Chronic": 0, "Probability": 0.0016}
Is there an efficient way to get Chronic in a column by itself? Thanks!
You can do it like this,
WITH jsons (x)
AS
(
-- replace this part with your query
Select a.x from (select '{"Chronic": 0, "Probability": 0.0016}' x ) as a, (SELECT 1 as y union select 2 as y ) as b
)
select
(SELECT value FROM OPENJSON(x,'$') where [key]='Chronic') as "Chronic",
(SELECT value FROM OPENJSON(x,'$') where [key]='Probability') as "Probability"
from jsons;
I think you can change the #json equation to use your query. But I cannot try since I don't have your tables...
BTW, I assume you are using MSSQL...

snowflake json lateral subquery

I have the following in snowflake:
create or replace table json_tmp as select column1 as id, parse_json(column2) as c
from VALUES (1,
'{"id": "0x1",
"custom_vars": [
{ "key": "a", "value": "foo" },
{ "key": "b", "value": "bar" }
] }') v;
Based on the FLATTEN docs, I hoped to turn these into a table looking like this:
+-------+---------+-----+-----+
| db_id | json_id | a | b |
+-------+---------+-----+-----+
+-------+---------+-----+-----+
| 1 | 0x1 | foo | bar |
+-------+---------+-----+-----+
Here is the query I tried; it resulted in a SQL compilation error: "Object 'CUSTOM_VARS' does not exist."
select json_tmp.id as dbid,
f.value:id as json_id,
a.v,
b.v
from json_tmp,
lateral flatten(input => json_tmp.c) as f,
lateral flatten(input => f.value:custom_vars) as custom_vars,
lateral (select value:value as v from custom_vars where value:key = 'a') as a,
lateral (select value:value as v from custom_vars where value:key = 'b') as b;
What exactly is the error here? Is there a better way to do this transformation?
Note - your solution doesn't actually perform any joins - flatten is a "streaming" operation, it "explodes" the input, and then selects the rows it wants. If you only have 2 attributes in the data, it should be reasonably fast. However, if not, it can lead to an unnecessary data explosion (e.g. if you have 1000s of attributes).
The fastest solution depends on how your data is structured exactly, and what you can assume about the input. For example, if you know that 'a' and 'b' are always in that order, you can obviously use
select
id as db_id,
c:id,
c:custom_vars[0].value,
c:custom_vars[1].value
from json_tmp;
If you know that custom_vars is always 2 elements, but the order is not known, you could do e.g.
select
id as db_id,
c:id,
iff(c:custom_vars[0].key = 'a', c:custom_vars[0].value, c:custom_vars[1].value),
iff(c:custom_vars[0].key = 'b', c:custom_vars[0].value, c:custom_vars[1].value)
from json_tmp;
If the size of custom_vars is unknown, you could create a JavaScript function like extract_key(custom_vars, key) that would iterate over custom_vars and return value for the found key (or e.g. null or <empty_string> if not found).
Hope this helps. If not, please provide more details about your problem (data, etc).
Update Nov 2019
There seems to be a function that does this sort of thing:
select json_tmp.id as dbid,
json_tmp.c:id as json_id,
object_agg(custom_vars.value:key, custom_vars.value:value):a as a,
object_agg(custom_vars.value:key, custom_vars.value:value):b as b
from
json_tmp,
lateral flatten(input => json_tmp.c, path => 'custom_vars') custom_vars
group by json_tmp.id
Original answer Sept 2017
The following query seems to work:
select json_tmp.id as dbid,
json_tmp.c:id as json_id,
a.value:value a,
b.value:value b
from
json_tmp,
lateral flatten(input => json_tmp.c, path => 'custom_vars') a,
lateral flatten(input => json_tmp.c, path => 'custom_vars') b
where a.value:key = 'a' and b.value:key = 'b'
;
I'd rather filter in a subquery rather than on the join, so I'm still interested in seeing other answers.

Oracle: outer join(+) with or clause replacement

I have an enormous select that schematically looks like this:
SELECT c_1, c_2, ..., c_j FROM t_1, t_2, ..., t_k
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
ORDER BY o
where j, k and l are in hundreds and e_mn is a column from some table. I need to add new columns A_1 and A_2 to the select from a new table T. The new columns are connected to the former select via a column call it B from a table R. I want those rows where A_1 = B or A_2 = B or those rows where there is no correspondeing A_i to the value B.
Suppose I only had to deal with tables T and R then I want this:
SELECT * FROM R
LEFT OUTER JOIN T
ON (A_1 = B OR A_2 = B)
To mimic this behaviour I'd want something like this in the big select:
SELECT c_1, c_2, ..., c_j, A_1, A_2 FROM t_1, t_2, ..., t_k, T
WHERE e_11 = e_12(+)
AND e_21 = e_22(+)
AND ...
AND e_l1 = e_l2(+)
AND (B = A_1(+) OR B = A_2(+))
ORDER BY o
this is, however, syntactically incorrect since the (+) operator cannot be used with the OR caluse. And if I leave out the (+)'s I lose those rows where there is no corresponding A_i to B.
What are my options here? Can I somehow find a way to do this without changing the whole body of the select? I doubt there is a reasonable way to do this, nevertheless I'd appreciate any help.
Thanks.

How to use a variable AS a where clause?

I have one where clause which I have to use multiple times. I am quite new to Oracle SQL, so please forgive me for my newbe mistakes :). I have read this website, but could not find the answer :(. Here's the SQL statement:
var condition varchar2(100)
exec :condition := 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from table_name
where category = Y AND :condition
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
The content field is a CLOB field and unfortunately all values needed are in the same column. My query does not work ofcourse.
You can't use a bind variable for that much of a where clause, only for specific values. You could use a substitution variable if you're running this in SQL*Plus or SQL Developer (and maybe some other clients):
define condition = 'column 1 = 1 AND column2 = 2, etc.'
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from table_name
where category = X AND &condition
...
From other places, including JDBC and OCI, you'd need to have the condition as a variable and build the query string using that, so it's repeated in the code that the parser sees. From PL/SQL you could use dynamic SQL to achieve the same thing. I'm not sure why just repeating the conditions is a problem though, binding arguments if values are going to change. Certainly with two clauses like this it seems a bit pointless.
But maybe you could approach this from a different angle and remove the need to repeat the where clause. Querying the table twice might not be efficient anyway. You could apply your condition once as a subquery, but without knowing your indexes or the selectivity of the conditions this could be worse:
with sub_table as (
select category, content
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)
Select a.content, b.content
from
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3)) as content
from sub_table
where category = X
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3))
) A
,
(Select (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100)) as content
from sub_table
where category = Y
group by (DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100))) B
GROUP BY
a.content, b.content
I'm not sure what the grouping is for - to eliminate duplicates? This only really makes sense if you have a single X and Y record matching the other conditions, doesn't it? Maybe I'm not following it properly.
You could also use a case statement:
select max(content_x), max(content_y)
from (
select
case when category = X
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,3) end as content_x,
case when category = Y
then DBMS_LOB.SUBSTR(ost_bama_vrij_veld.inhoud,100) end as content_y,
from my_table
where category in (X, Y)
and column 1 = 1 AND column2 = 2, etc.
)

How to find the occurrences of a column mapped to a corresponding column in a query SQL

I have a query as below
select custref, tetranumber
from
(select *
from cdsheader h, custandaddr c
where h.custref=c.cwdocid and c.addresstype = 'C' )
where tetranumber = '034096'
The objective is the 2nd column should have only one corresponding 1st column
Ex : 034096 should have always have 2600135 as the first column
I would like to check if there is any value apart from 2600135 for 034096.
(I am a java developer and suggested a solution to avoid 1 to n or n to n mappings of data but there is a bad data already in the DB(Oracle), so I would like to check whether there is a bad data so that I could delete the data)
re: The objective is the 2nd column should have only one corresponding 1st column
You'll need to perform an aggregate function, like MAX or MIN, to determine which of the row is returned.
Thanks for the response guys,
I have figured out the way and here it goes...
select custref, count(distinct(tetranumber)) from(
select custref, tetranumber from cdsheader h, custandaddr c where h.custref=c.cwdocid and c.addresstype = 'C')
group by custref having count(distinct(tetranumber))>1