Is there a simple way to transform an array based on a dimension table? - sql

I have two tables:
One with a column with is an array of identifiers.
Another which is a dimension table of that map these identifiers to another value
I'm looking to transform the column of the first table using the dimension table.
Example:
Table 1:
Column A | Column B
'Bob' | ['a', 'b', 'c']
'Terry' | ['a']
Dimension Table:
Column C | Column D
'a' | 1
'b' | 2
'c' | 3
Expected Output:
Column A | Column B
'Bob' | [1,2,3]
'Terry' | [1]
Is there a way to do this (preferably in Presto) without exploding and re-aggregating the array column ?

i guess you would be able to do this without exploding and re-aggregation by using transform_keys, not sure this is easier though.
SELECT map_keys(transform_keys(MAP(ARRAY['a','c'], ARRAY[null,null]),
(k, v) -> MAP(ARRAY['a', 'b', 'c', 'd'], ARRAY[1,2,3,4])[k]));
I guess it requires that the dimension table is not "too big".

Related

SQL Presto Aggregate Table column values with another column values

Hi I want to do SQL Presto query for the data table (say user_data) looks like
user | target | result
-----------------------------
1 | b | {A: 1}
2 | a | {C: 2}
1 | c | {A: 2, B: 3}
2 | d | {A: 1}
1 | d | {C: 4}
With this data table, I would like to generate the following two outputs.
Output 1: Count the number of unique targets for each result for each user. For example, for user 1, this user has 2 targets (b and c) who have result A. And it has one target for each result B (target c) and C (target d).
user | result
-------------------
1 | {A: 2, B:1, C:1}
2 | {A: 1, C: 1}
Output 2: Aggregate the last column based on the targets of the user.
user | result
-------------------
1 | {A:[b,c], B:[c], C:[d]}
2 | {A:[d], C:[a]}
** Or Even better, can we make a one table that has both columns?
user | result 1 | result 2
--------------------------------------------------
1 | {A:[b,c], B:[c], C:[d]} | {A: 2, B:1, C:1}
2 | {A:[d], C:[a]} | {A: 1, C: 1}
Can anyone help me with it? I would really appreciate it.
I'm pretty new to SQL so I didn't even know how to start it.`
This can be achieved with map aggregate functions. Assuming that result originally is a map you can flatten it with unnest and then group by user and use multimap_agg and histogram functions:
-- sample data
WITH dataset(user, target, result) AS (
VALUES (1, 'b', map(array['A'], array[1])),
(2, 'a', map(array['C'], array[2])),
(1, 'c', map(array['A', 'B'], array[2, 3])),
(2, 'd', map(array['A'], array[1])),
(1, 'd', map(array['C'], array[4]))
)
-- query
select user, multimap_agg(k, target), histogram(k)
from dataset,
unnest(result) as t(k, v)
group by user;
Output:
user
_col1
_col2
2
{A=[d], C=[a]}
{A=1, C=1}
1
{A=[b, c], B=[c], C=[d]}
{A=2, B=1, C=1}

Aggregate columns containing dictionary in SQL presto

Hi I want to do SQL Presto query for the data table (say user_data) looks like
user | target | result
-----------------------------
1 | b | {A: 1}
2 | a | {C: 2}
1 | c | {A: 2, B: 3}
2 | d | {A: 1}
1 | d | {C: 4}
With this data table, I would like to generate the following two outputs.
Output 1: Aggregate the values of the {key:value} dictionary based on the user and regardless of target
user | result
-------------------
1 | {A:3, B:3, C:4}
2 | {A:1, C:2}
Output 2: Aggregate the last column based on the targets of the user.
user | result
-------------------
1 | {A:[b,c], B:[c], C:[d]}
2 | {A:[d], C:[a]}
Can anyone help me with it? I would really appreciate it.
Second one can be easily achieved with multimap_agg (add transform_values with array_distinct to remove duplicates if needed):
-- sample data
WITH dataset(user, target, result) AS (
values (1, 'b', map(array['A'], array[1])),
(2, 'a', map(array['C'], array[2])),
(1, 'c', map(array['A', 'B'], array[1, 2]))
)
-- query
select user, multimap_agg(k, target)
from dataset,
unnest(result) as t (k,v)
group by user;
Output:
user
_col1
1
{A=[b, c], B=[c]}
2
{C=[a]}
As for the first one - you can look into using map_union_sum if it is available in your version of Presto. Or use some magic with unnest and transform_values:
-- query
select user,
transform_values(
multimap_agg(k, v),
(k,v) -> reduce(v, 0, (s, x) -> s + x, s -> s) -- or array_sum if available
)
from dataset,
unnest(result) as t (k, v)
group by user;
Output:
user
_col1
1
{A=2, B=2}
2
{C=2}

Output multiple summarized lists with KQL

I want to output multiple lists of unique column values with KQL.
For instance for the following table:
A
B
C
1
x
one
1
x
two
1
y
one
I want to output
K
V
A
[1]
B
[x,y]
C
[one, two]
I accomplished this using summarize with make_list and 2 unions, been wondering if its possible to accomplish this in the same query without union?
Table
| distinct A
| summarize k="A", v= make_list(A)
union
Table
| distinct b
| summarize k="B", v= make_list(B)
...
if your data set is reasonably-sized, you could try using the narrow() plugin: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/narrowplugin
datatable(A:int, B:string, C:string)
[
1, 'x', 'one',
1, 'x', 'two',
1, 'y', 'one',
]
| evaluate narrow()
| summarize make_set(Value) by Column
Column
set_Value
A
["1"]
B
["x","y"]
C
["one","two"]
Alternatively, you could use a combination of pack_all() and mv-apply
datatable(A:int, B:string, C:string)
[
1, 'x', 'one',
1, 'x', 'two',
1, 'y', 'one',
]
| project p = pack_all()
| mv-apply p on (
extend key = tostring(bag_keys(p)[0])
| project key, value = p[key]
)
| summarize make_set(value) by key
key
set_value
A
["1"]
B
["x","y"]
C
["one","two"]

Is it possible to map values onto a table given corresponding row and column indices in SQL?

I have a SQL table in the form of:
| value | row_loc | column_loc |
|-------|---------|------------|
| a | 0 | 1 |
| b | 1 | 1 |
| c | 1 | 0 |
| d | 0 | 0 |
I would like to find a way to map it onto a table/grid, given the indices, using SQL. Something like:
| d | a |
| c | b |
(The context being, I would like to create a colour map with colours corresponding to values a, b, c, d, in the locations specified)
I would be able to do this iteratively in python, but cannot figure out how to do it in SQL, or if it is even possible! Any help or guidance on this problem would be greatly appreciated!
EDIT: a, b, c, d are examples of numeric values (which would not be able to be selected using named variables in practice, so I'm relying on selecting them based on location. Also worth noting, the number of rows and columns will always be the same. The value column is also not the primary key to this table, so is not necessarily unique, it is just as a continuous value.
Yes, it is possible, assuming the column number is limited since SQL supports only determined number of columns. The number of rows in result set depends on number of distinct row_loc values so we have to group by column row_loc. Then choose value using simple case.
with t (value, row_loc, column_loc) as (
select 'a', 0, 1 from dual union all
select 'b', 1, 1 from dual union all
select 'c', 1, 0 from dual union all
select 'd', 0, 0 from dual
)
select max(case column_loc when 0 then value else null end) as column0
, max(case column_loc when 1 then value else null end) as column1
from t
group by row_loc
order by row_loc
I tested it on Oracle. Not sure what to do if multiple values match on same coordinate, I chose max. For different vendors you could also utilize special clauses such as count ... filter (where ...). Or the Oracle pivot clause can also be used.

How to unnest two lists from two columns in BigQuery without cross product, as individual rows

I have a table in BigQuery, it has two columns, each column contains an array. for a given row, both columns will contain arrays of the same length, but that length can vary from row to row:
WITH tbl AS (
select ['a','b','c'] AS one, [1,2,3] as two
union all
select ['a','x'] AS two, [10,20] as two
)
select * from tbl
So the table will look like:
row | one | two
-----------------------
1 | [a,b,c] | [1,2,3]
2 | [a,x] | [10,20]
I would like to unnest in such a way that each row, in the new table, will have an element of an array from column1 and an corresponding element from column2. So from the table above, I am looking to get:
row | one | two
---------
1 | a | 1
2 | b | 2
3 | c | 3
4 | a | 10
5 | x | 20
Any help would be much appreciated! Thanks!
below is for BigQuery Standard SQL
#standardSQL
SELECT z.*
FROM `project.dataset.table` t,
UNNEST(ARRAY(
SELECT AS STRUCT one, two
FROM UNNEST(one) one WITH OFFSET
JOIN UNNEST(two) two WITH OFFSET
USING(OFFSET)
)
) z
You can test, play with above using sample data from your question - result will be
Row one two
1 a 1
2 b 2
3 c 3
4 a 10
5 x 20
I dont fully understand the syntax, could you please explain it?
Explanation:
Step 1
for each row in table below array is calculated
ARRAY(
SELECT AS STRUCT one, two
FROM UNNEST(one) one WITH OFFSET
JOIN UNNEST(two) two WITH OFFSET
USING(OFFSET)
)
Elements of this array are structs with respective value from two column - they are being matched with each other by JOIN'ing on their positions in initial arrays (OFFSET)
Step 2
Then this array gets UNNEST'ed and cross JOIN'ed with respective row in the table - and whole row is actually ignored and only that struct (z) is being brought into to the output
Step 3
And finally to output not a a struct but rather as a separate columns - z.* is used
Hope this helped :o)