Output multiple summarized lists with KQL - kql

I want to output multiple lists of unique column values with KQL.
For instance for the following table:
A
B
C
1
x
one
1
x
two
1
y
one
I want to output
K
V
A
[1]
B
[x,y]
C
[one, two]
I accomplished this using summarize with make_list and 2 unions, been wondering if its possible to accomplish this in the same query without union?
Table
| distinct A
| summarize k="A", v= make_list(A)
union
Table
| distinct b
| summarize k="B", v= make_list(B)
...

if your data set is reasonably-sized, you could try using the narrow() plugin: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/narrowplugin
datatable(A:int, B:string, C:string)
[
1, 'x', 'one',
1, 'x', 'two',
1, 'y', 'one',
]
| evaluate narrow()
| summarize make_set(Value) by Column
Column
set_Value
A
["1"]
B
["x","y"]
C
["one","two"]
Alternatively, you could use a combination of pack_all() and mv-apply
datatable(A:int, B:string, C:string)
[
1, 'x', 'one',
1, 'x', 'two',
1, 'y', 'one',
]
| project p = pack_all()
| mv-apply p on (
extend key = tostring(bag_keys(p)[0])
| project key, value = p[key]
)
| summarize make_set(value) by key
key
set_value
A
["1"]
B
["x","y"]
C
["one","two"]

Related

SQL Presto Aggregate Table column values with another column values

Hi I want to do SQL Presto query for the data table (say user_data) looks like
user | target | result
-----------------------------
1 | b | {A: 1}
2 | a | {C: 2}
1 | c | {A: 2, B: 3}
2 | d | {A: 1}
1 | d | {C: 4}
With this data table, I would like to generate the following two outputs.
Output 1: Count the number of unique targets for each result for each user. For example, for user 1, this user has 2 targets (b and c) who have result A. And it has one target for each result B (target c) and C (target d).
user | result
-------------------
1 | {A: 2, B:1, C:1}
2 | {A: 1, C: 1}
Output 2: Aggregate the last column based on the targets of the user.
user | result
-------------------
1 | {A:[b,c], B:[c], C:[d]}
2 | {A:[d], C:[a]}
** Or Even better, can we make a one table that has both columns?
user | result 1 | result 2
--------------------------------------------------
1 | {A:[b,c], B:[c], C:[d]} | {A: 2, B:1, C:1}
2 | {A:[d], C:[a]} | {A: 1, C: 1}
Can anyone help me with it? I would really appreciate it.
I'm pretty new to SQL so I didn't even know how to start it.`
This can be achieved with map aggregate functions. Assuming that result originally is a map you can flatten it with unnest and then group by user and use multimap_agg and histogram functions:
-- sample data
WITH dataset(user, target, result) AS (
VALUES (1, 'b', map(array['A'], array[1])),
(2, 'a', map(array['C'], array[2])),
(1, 'c', map(array['A', 'B'], array[2, 3])),
(2, 'd', map(array['A'], array[1])),
(1, 'd', map(array['C'], array[4]))
)
-- query
select user, multimap_agg(k, target), histogram(k)
from dataset,
unnest(result) as t(k, v)
group by user;
Output:
user
_col1
_col2
2
{A=[d], C=[a]}
{A=1, C=1}
1
{A=[b, c], B=[c], C=[d]}
{A=2, B=1, C=1}

Aggregate columns containing dictionary in SQL presto

Hi I want to do SQL Presto query for the data table (say user_data) looks like
user | target | result
-----------------------------
1 | b | {A: 1}
2 | a | {C: 2}
1 | c | {A: 2, B: 3}
2 | d | {A: 1}
1 | d | {C: 4}
With this data table, I would like to generate the following two outputs.
Output 1: Aggregate the values of the {key:value} dictionary based on the user and regardless of target
user | result
-------------------
1 | {A:3, B:3, C:4}
2 | {A:1, C:2}
Output 2: Aggregate the last column based on the targets of the user.
user | result
-------------------
1 | {A:[b,c], B:[c], C:[d]}
2 | {A:[d], C:[a]}
Can anyone help me with it? I would really appreciate it.
Second one can be easily achieved with multimap_agg (add transform_values with array_distinct to remove duplicates if needed):
-- sample data
WITH dataset(user, target, result) AS (
values (1, 'b', map(array['A'], array[1])),
(2, 'a', map(array['C'], array[2])),
(1, 'c', map(array['A', 'B'], array[1, 2]))
)
-- query
select user, multimap_agg(k, target)
from dataset,
unnest(result) as t (k,v)
group by user;
Output:
user
_col1
1
{A=[b, c], B=[c]}
2
{C=[a]}
As for the first one - you can look into using map_union_sum if it is available in your version of Presto. Or use some magic with unnest and transform_values:
-- query
select user,
transform_values(
multimap_agg(k, v),
(k,v) -> reduce(v, 0, (s, x) -> s + x, s -> s) -- or array_sum if available
)
from dataset,
unnest(result) as t (k, v)
group by user;
Output:
user
_col1
1
{A=2, B=2}
2
{C=2}

PostgreSQL query get all entries with many-to-many relationship with another table

I have a table posts which has a many-to-many relationship with tags, the pivot table is called posts-tags.
I want to be able to retrieve all posts by a list of tag id's.
Imagine
posts
id | text
--------
1 | "foo"
2 | "bar"
3 | "baz"
posts_tags
post_id | tag_id
-----------------
1 | 1
1 | 2
1 | 3
2 | 1
3 | 1
tags
id | name
--------
1 | "foo"
2 | "bar"
3 | "baz"
With tag id's [1, 2, 3], I should get back [{id: 1, text: "foo"}]
With tag id's [1], I should get back [{id: 1, text: "foo"}, {id: 2, text: "bar"}, {id: 3, text: "baz"}]
Basically, I want to retrieve all the posts related to the list of tags.
You can use a subquery to filter posts that have all the specified tags:
select json_agg(json_build_object('id', p.id, 'text', p.txt))
from posts p where (select count(*) from json_array_elements('[1, 2, 3]') v
join post_tags t on t.post_id = p.id and v.value::text::int = t.tag_id) = json_array_length('[1, 2, 3]')
See fiddle.

select linked rows in the same table

I'm creating a branching dialog game, and used a dialog tool that outputs JSON with a link and a link_path to connect dialogs together. I've parsed and inserted this structure in PostgreSQL.
I want to query a subset of rows, let's say starting with row 1, and follow the link_path until the link_path is null. Successive rows may be out of order.
For example, in the table below,
starting with row 1, I find row with link_path = b,
this gives me row 3, I find row with link_path = c,
this gives me row 4, row 4's link_path is null, so we return this set: [row 1, row 3, row 4]
--
link link_path info
--------------------------
a b asdjh
w y akhaq
b c uiqwd
c isado
y z qwiuu
z nzabo
In PostgreSQL, how can I select rows like this without creating a loop of queries? My goal is performance.
You can use a recursive query:
with recursive cte as (
select t.* from mytable t where link = 'a'
union all
select t.*
from cte c
inner join mytable t on t.link = c.link_path
)
select * from cte
Demo on DB Fiddle:
link | link_path | info
:--- | :-------- | :----
a | b | asdjh
b | c | uiqwd
c | null | isado

Is there a simple way to transform an array based on a dimension table?

I have two tables:
One with a column with is an array of identifiers.
Another which is a dimension table of that map these identifiers to another value
I'm looking to transform the column of the first table using the dimension table.
Example:
Table 1:
Column A | Column B
'Bob' | ['a', 'b', 'c']
'Terry' | ['a']
Dimension Table:
Column C | Column D
'a' | 1
'b' | 2
'c' | 3
Expected Output:
Column A | Column B
'Bob' | [1,2,3]
'Terry' | [1]
Is there a way to do this (preferably in Presto) without exploding and re-aggregating the array column ?
i guess you would be able to do this without exploding and re-aggregation by using transform_keys, not sure this is easier though.
SELECT map_keys(transform_keys(MAP(ARRAY['a','c'], ARRAY[null,null]),
(k, v) -> MAP(ARRAY['a', 'b', 'c', 'd'], ARRAY[1,2,3,4])[k]));
I guess it requires that the dimension table is not "too big".