Sorting results in recursive query - sql

I have a basic CATEGORIES-like table consisting of fields like the primary_key, a parent_id, title and a sorting integer.
I am able to retrieve the results using CTE and convert them to a json array but I want to fetch them, aside from the parent_id, based on the sorting value.
So far:
with recursive parents as
(
select n.boat_type_id, n.title, '{}'::int[] as parents, 0 as level
from boat_types n
where n.parent_id is NULL
union all
select n.boat_type_id, n.title, parents || n.parent_id, level+1
from parents p
join boat_types n on n.parent_id = p.boat_type_id
where not n.boat_type_id = any(parents)
),
children as
(
select n.parent_id, json_agg(jsonb_build_object('title', n.title->>'en'))::jsonb as js
from parents tree
join boat_types n using(boat_type_id)
where level > 0 and not boat_type_id = any(parents)
group by n.parent_id
union all
select n.parent_id, jsonb_build_object('category', n.title->>'en') || jsonb_build_object('subcategories', js) as js
from children tree
join boat_types n on n.boat_type_id = tree.parent_id
)
select jsonb_agg(js) as categories
from children
where parent_id is null
The above provides me with the result-set and the structure I want, but how can I make them follow the sorting value for both the nodes and the leafs.
Sample response:
[
{
"sorting":0,
"category":"Motor",
"subcategories":[
{
"title":"Motor Yacht",
"sorting":2
},
{
"title":"Mega Yacht",
"sorting":1
}
]
},
{
"sorting":1,
"category":"Sailing",
"subcategories":[
{
"title":"Sailing Yacht",
"sorting":2
},
{
"title":"Cruiser Racer",
"sorting":1
}
]
},
{
"sorting":2,
"category":"Catamaran",
"subcategories":[
{
"title":"Catamaran",
"sorting":2
},
{
"title":"Trimaran",
"sorting":1
}
]
},
{
"sorting":3,
"category":"Other",
"subcategories":[
{
"title":"Other",
"sorting":2
},
{
"title":"Airboat",
"sorting":1
}
]
}
]
I have tried aggregating the sorting values in ARRAY fields and sort by it, but it doesn't work.

You can use the order by clause in the json_agg() aggregate:
...
children as
(
select
n.parent_id,
json_agg(jsonb_build_object('title', n.title->>'en', 'sorting', n.sorting) order by n.sorting)::jsonb as js
from parents tree
join boat_types n using(boat_type_id)
where level > 0 and not boat_type_id = any(parents)
group by n.parent_id
union all
select
n.parent_id,
jsonb_build_object('category', n.title->>'en', 'sorting', n.sorting) || jsonb_build_object('subcategories', js) as js
from children tree
join boat_types n on n.boat_type_id = tree.parent_id
)
...

Related

Create nested json in Snowflake

I am trying to create a nested json in Snowflake and have narrowed down the query like below where I have nested it on id. However, I want the nested json to also apply to the inner layer and I am finding it hard to get the right query for it.
WITH subquery AS (
SELECT id, placeId, actionId, resultValue
FROM my_table
)
SELECT id,
'{"resultValues": {' || listagg('"' || placeId || '": {"' || actionId || '": ' || resultValue || '}', ',') within group (order by placeId) || '}}' as nested_json
FROM subquery
GROUP BY id;
Below is how the current result is looking like for each id.
I am trying to get the actionId1 and actionId2 grouped under the placeId1 and placeId2 so that it looks like below. How do I get this done? Any ideas would be appreciated.
Meet FLATTEN() and LATERAL they like to hang out with OBJECT_AGG() who needs his own space via CTE's.
WITH CTE AS (
SELECT
parse_json(
' { "resultValues": [
{ "placeId1": { "actionId1": 1.1 } }, { "placeId1": { "actionId2": 1.2 } },
{ "placeId2": { "actionId1": 1.3 } }, { "placeId2":{ "actionId2": 1.4} } ] }'
) VOLIA
),
CTE2 AS (
SELECT
DISTINCT KIAORA.PATH KIAORA,
TE_REO.PATH TE_REO,
OBJECT_AGG(MAORI.PATH, MAORI.VALUE) OVER (PARTITION BY TE_REO.PATH) MAORI
FROM
CTE,
LATERAL FLATTEN(INPUT => VOLIA) KIAORA,
LATERAL FLATTEN(KIAORA.VALUE) HELLO,
LATERAL FLATTEN(HELLO.VALUE) TE_REO,
LATERAL FLATTEN (INPUT => TE_REO.VALUE) MAORI
)
SELECT
DISTINCT OBJECT_CONSTRUCT(
KIAORA,
ARRAY_CONSTRUCT(
OBJECT_AGG(TE_REO, MAORI) OVER (PARTITION BY KIAORA)
)
) ANSWER,
VOLIA
FROM
CTE2, CTE
after starting :

create a group of linked items

There is a list of users, who buy different product items. I want to group the item by user buying behavior. If any user buys two products, these shall be in the same group. The buying links the products.
user
item
1
cat food
1
cat toy
2
cat toy
2
cat snacks
10
dog food
10
dog collar
11
dog food
11
candy
12
candy
12
apples
15
paper
In this sample case all items for a cat shall be grouped together: "cat food" to "cat toy" to "cat snacks". The items with dog, candy, apples should be one group, because user buying’s link these. The paper is another group.
There are about 200 different products in the table and I need to do a disjoint-set union (DSU).
In JavaScript there several implementation of Disjoint Set Union (DSU), here this was used for the user defined function (UDF) in BigQuery. The main idea is to use a find and union function and to save the linking in a tree, represented as an array, please see here for details.
create temp function DSU(A array<struct<a string,b string>>)
returns array<struct<a string,b string>>
language js as
"""
// https://gist.github.com/KSoto/3300322fc2fb9b270dce2bf1e3d80cf3
// Disjoint-set bigquery
class DSU {
constructor() {
this.parents = [];
}
find(x) {
if(typeof this.parents[x] != "undefined") {
if(this.parents[x]<0) {
return x;
} else {
if(this.parents[x]!=x) {
this.parents[x]=this.find(this.parents[x]);
}
return (this.parents[x]);
}
} else {
this.parents[x]=-1;
return x;
}
}
union(x,y) {
var xpar = this.find(x);
var ypar = this.find(y);
if(xpar != ypar) {
this.parents[xpar]+=this.parents[ypar];
this.parents[ypar]=xpar;
}
}
console_print() {
// console.log(this.parents);
}
}
var dsu = new DSU();
for(var i in A){
dsu.union(A[i].a,A[i].b);
}
var out=[]
for(var i in A){
out[i]={b:dsu.find(A[i].a),a:A[i].a};
}
return out;
""";
with #recursive
your_table as (
SELECT 1 as user, "cat food" as item
UNION ALL SELECT 1, "cat toy"
UNION ALL SELECT 2, "cat snacks"
UNION ALL SELECT 2, "cat toy"
UNION ALL SELECT 10, "dog food"
union all select 10, "dog collar"
union all select 11, "dog food"
union all select 11, "candy"
union all select 12, "candy"
union all select 12, "apples"
union all select 15, "paper"
), helper as (
select distinct a, b
from (
Select user,min(item) as b, array_agg(item) as a_list
from your_table
group by 1
), unnest(a_list) as a
)
Select * except(tmp_count),
first_value(item) over(partition by b order by tmp_count desc,b) as item_most_common
from
(
select * ,
count(item) over(partition by b,item) as tmp_count
from your_table
left join (select X.a, min(X.b) as b from (select DSU(array_agg(struct(''||a,''||b))) as X from helper),unnest(X) X group by 1 order by 1) as combinder
on ''||item=combinder.a
)
The data is in the table your_table. A helper table is used to buid all pairs of two items, which any user brought. Combined as an array, this is giving to the UDF DSU. This function returns all items in column a and in column b the group. We want the most common item of the group to be shown as group name, therefore we use some window functions to determine it.

PSQL Join alternative to return all rows

I've got a PSQL function that has 3 joins in it and the data is returned in a json object. I have a 4th table that I need to get data from but it has a one-to-many relationship with the table I wish to join on.
This is my current code:
select json_agg(row_to_json(s)) as results from (
select g.*,row_to_json(o.*) as e_occurence,
row_to_json(d.*) as e_definition,
row_to_json(u.*) as e_e_updates,
cardinality(o.m_ids) as m_count
from schema.e_group g
join schema.e_occurrence o on g.id = o.e_group_id
join schema.e_definition d on g.e_id = d.id
left join schema.e_e_updates u on d.id = u.e_id
) s
This gets me an array of objects that follows this rough structure:
[
{
"id": 11308158,
"e_id": 16,
"created_on": "2020-09-09T12:08:07.556062",
"event_occurence": {
"id": 9081887,
"e_id": 16,
"e_group_id": 11308158
},
"e_definition": {
"id": 16,
"name": "Placeholder name"
},
"e_e_updates": {
"id": 22,
"user_id": "7281057e-2876-1673-js7d-7cqj611b4557",
"e_id": 16
},
"m_count": 0
}
]
My problem is that the table e_e_updates can have multiple records for each corresponding e_definition.id.
Clearly the join will not work as hoped in this instance as I'd like e_e_updates to be an array of all the linked rows.
Is there an alternative means of solving this issue?
Basically, you need another level of aggregation. This should do what you want:
select json_agg(row_to_json(s)) as results
from (
select
g.*,
row_to_json(o.*) as e_occurence,
row_to_json(d.*) as e_definition,
u.u_arr as e_e_updates,
cardinality(o.m_ids) as m_count
from schema.e_group g
join schema.e_occurrence o on g.id = o.e_group_id
join schema.e_definition d on g.e_id = d.id
left join (
select e_id, json_agg(row_to_json(*)) u_arr
from schema.e_e_updates
group by on e_id
) u on d.id = u.e_id
) s
You could also do this with a subquery:
select json_agg(row_to_json(s)) as results
from (
select
g.*,
row_to_json(o.*) as e_occurence,
row_to_json(d.*) as e_definition,
(
select json_agg(row_to_json(u.*))
from schema.e_e_updates u
where u.e_id = d.id
) as e_e_updates,
cardinality(o.m_ids) as m_count
from schema.e_group g
join schema.e_occurrence o on g.id = o.e_group_id
join schema.e_definition d on g.e_id = d.id
) s

Snowflake get_path() or flatten() array query - to find latest key:value

I have a column 'amp' in a table 'EXAMPLE'. Column 'amp' is an array which looks like this:
[{
"list": [{
"element": {
"x_id": "12356789XXX",
"y_id": "12356789XXX38998",
}
},
{
"element": {
"x_id": "5677888356789XXX",
"y_id": "1XXX387688",
}
}]
}]
How should I query using get_path() or flatten() to extract the latest x_id and y_id value (or other alternative)
In this example it is only 2 elements, but there could 1 to 6000 elements containing x_id and y_id.
Help much appreciated!
Someone may have a more elegant way than this, but you can use a CTE. In the first table expression, grab the max of the array. In the second part, grab the values you need.
set json = '[{"list": [{"element": {"x_id": "12356789XXX","y_id": "12356789XXX38998"}},{"element": {"x_id": "5677888356789XXX","y_id": "1XXX387688",}}]}]';
create temp table foo(v variant);
insert into foo select parse_json($json);
with
MAX_INDEX(M) as
(
select max("INDEX") MAX_INDEX
from foo, lateral flatten(v, recursive => true)
),
VALS(V, P, K) as
(
select "VALUE", "PATH", "KEY"
from foo, lateral flatten(v, recursive => true)
)
select k as "KEY", V::string as VALUE from vals, max_index
where VALS.P = '[0].list[' || max_index.m || '].element.x_id' or
VALS.P = '[0].list[' || max_index.m || '].element.y_id'
;
Assuming that the outer array ALWAYS contains a single dictionary element, you could use this:
SELECT amp[0]:"list"[ARRAY_SIZE(amp[0]:"list")-1]:"element":"x_id"::VARCHAR AS x_id
,amp[0]:"list"[ARRAY_SIZE(amp[0]:"list")-1]:"element":"y_id"::VARCHAR AS y_id
FROM T
;
Or if you prefer a bit more modularity/readability, you could use this:
WITH CTE1 AS (
SELECT amp[0]:"list" AS _ARRAY
FROM T
)
,CTE2 AS (
SELECT _ARRAY[ARRAY_SIZE(_ARRAY)-1]:"element" AS _DICT
FROM CTE1
)
SELECT _DICT:"x_id"::VARCHAR AS x_id
,_DICT:"y_id"::VARCHAR AS y_id
FROM CTE2
;
Note: I have not used FLATTEN here because I did not see a good reason to use it.

sql query to select multiple items in sorted order

I am writing a post api in c# to select some values in Azure Cosmos db and is using direct sql queries.
The aim to get the highest value against each id from the request.
request body:
[
{
"userid":"1"
},
{
"userid":"4"
}
]
Db looks like:
{
"userid":"1",
"value":"10",
"Date":"10-9-19"
}
{
"userid":"1",
"value":"20",
"Date":"11-8-19"
}
{
"userid":"4",
"value":"30",
"Date":"10-9-19"
}
{
"userid":"4",
"value":"40",
"Date":"11-9-19"
}
Expected output:
[
{
"userid":"4",
"value":"40",
"Date":"11-9-19"
},
{
"userid":"1",
"value":"20",
"Date":"11-8-19"
}
]
I tried to get the id's into an array then used 'IN' operator, but it would be helpful and appreciated is there more simple query would help.
try the following to get the results.
As per your data, this will work.
SELECT userid,
MAX(value) value,
MAX(Date) Date
FROM YourTable
GROUP BY userid
ORDER BY userid
If you want related date for the MAX(Value), then try this.
SELECT Y.userid, Y.Value, Y.Date
FROM YourTable Y
JOIN
(
SELECT userid,
MAX(value) value
FROM YourTable
GROUP BY userid
)D ON D.userid = Y.userid AND D.value = Y.value