PRESTO MAP_FROM_ENTRIES AND CROSS JOIN UNNEST - sql

I have two queries which I Union and then use crossjoin unnest. The main purpose is I want to get the output that is item_name, item_value in horizontal table rather than vertical
WITH base AS(
SELECT ds,MAP_FROM_ENTRIES(ARRAY [('X',COUNT_IF(X != 0)),('y',SUM(Y))]) AS metrics_map
FROM table1
UNION ALL
SELECT ds,MAP_FROM_ENTRIES( ARRAY [('A',COUNT_IF(A != 0)),('b',SUM(B))]) AS metrics_map
FROM table2
)
SELECT ds,metric_name,metric_value from base cross join unnest(metrics_map) AS t(metric_name, metric_value)
The output should be ds,metric_name,metric_value with values in metric_name as X,y,A,B but i get values only as A,B. Can anyone help me to figure this out.

You can skip MAP_FROM_ENTRIES and use default row fields names for your entries:
WITH dataset AS (
SELECT * FROM (VALUES
(1, ARRAY [('X', 1),('y',2)]),
(2, ARRAY [('A', 3),('B',4)])
) AS t (ds, metrics_array))
SELECT ds, metric.field0 as metric_name, metric.field1 as metric_value
FROM dataset
cross join unnest(metrics_array) as t(metric)
And possibly use transform to give more meaningful names:
WITH dataset AS (
SELECT * FROM (VALUES
(1, ARRAY [('X', 1), ('y',2)]),
(2, ARRAY [('A', 3),('B',4)])
) AS t (ds, metrics_array))
SELECT ds, metric.metric_name, metric.metric_value
FROM (
SELECT ds, transform(metrics_array, r -> CAST(r as ROW(metric_name VARCHAR, metric_value DOUBLE))) as metrics_array
FROM dataset
)
cross join unnest(metrics_array) as t(metric)
Output:
ds
metric_name
metric_value
1
X
1.0
1
y
2.0
2
A
3.0
2
B
4.0

Related

Grouping item recursively in sql

I have this table (test.mytable in the sql script below)
CREATE OR REPLACE test.mytable (item STRING(1), I_groupe STRING(1));
INSERT INTO test.mytable (item, I_groupe)
values
('A', '1'),
('B', '1'),
('B', '2'),
('C', '2'),
('D', '3'),
item
Intermediate_group
A
1
B
1
B
2
C
2
D
3
My purpose is to group the item together. My expected result is :
item
Final_group
A,B,C
1
D
2
I would like to group the item A and B because they have at least one Intermediate_group in common (Intermediate_group 1). Then I would like to group A,B with C because there is an Intermediate_group in common (Intermediate_group 2). Item D has no intermediate group in common with other items. It is therefore alone in its final group.
I have this code:
WITH TEMP1 AS (
SELECT *
FROM (
select item as item_1,
array_agg(distinct I_groupe) as I_groupe1
from test.mytable
group by item_1) AS AA
cross join
(select item as item_2,
array_agg(distinct I_groupe) as I_groupe2
from test.mytable
group by item_2
) AS BB
)
,
TEMP2 AS (
SELECT item_1, item_2,
ARRAY(SELECT * FROM TEMP.I_groupe1
INTERSECT DISTINCT
(SELECT * FROM TEMP.I_groupe2)
) AS result
FROM TEMP1
)
,
TEMP3 AS (
SELECT item_1, item_2, test
FROM TEMP2, unnest(result) as test
)
,
TEMP4 AS (
SELECT STRING_AGG(DISTINCT item_2) as item, STRING_AGG(CAST(test AS STRING)) as I_groupe
FROM TEMP3
GROUP BY item_1
)
,
TEMP5 AS (
SELECT item, I_groupe
FROM TEMP4, UNNEST(SPLIT(item)) as item, UNNEST(SPLIT(I_groupe)) as I_groupe
)
I repeat this code/process manually three times for this "toy" example and finish by a select distinct to get only one row by Final_group
SELECT DISTINCT *
FROM TEMP14
But in a real example it's not scalable. I would like to use a recursive function or a loop to automate this code.
Thanks in advance for your help

pivot multi-level nested fields in bigquery

My bq table schema:
Continuing this post: bigquery pivoting with nested field
I'm trying to flatten this table. I would like to unnest the timeseries.data fields, i.e. the final number of rows should be equal to the total length of timeseries.data arrays. I would also like to add annotation.properties.key with certain value as additional columns, and annotation.properties.value as its value. So in this case, it would be the "margin" column. However the following query gives me error: "Unrecognized name: data". But after the last FROM, I did already: unnest(timeseries.data) as data.
flow_timestamp, channel_name, number_of_digits, timestamp, value, margin
2019-10-31 15:31:15.079674 UTC, channel_1, 4, 2018-02-28T02:00:00, 50, 0.01
query:
SELECT
flow_timestamp, timeseries.channel_name,
( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
),
data.timestamp ,data.value
,(with subq as (select * from unnest(data.annotation))
select max(if (properties.key = 'margin', properties.value, null))
from (
select * from unnest(subq.properties)
) as properties
) as margin
FROM my_table
left join unnest(timeseries.data) as data
WHERE DATE(flow_timestamp) between "2019-10-28" and "2019-11-02"
order by flow_timestamp
Try below
#standardSQL
SELECT
flow_timestamp,
timeseries.channel_name,
( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
) AS number_of_digits,
item.timestamp,
item.value,
( SELECT MAX(IF(prop.key = 'margin', prop.value, NULL))
FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop
) AS margin
FROM my_table
LEFT JOIN UNNEST(timeseries.data) item
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02'
ORDER BY flow_timestamp
Below is a little less verbose version of the same solution, but I usually prefer above as it simpler to maintain
#standardSQL
SELECT
flow_timestamp,
timeseries.channel_name,
( SELECT MAX(IF(key = 'number_of_digits', value, NULL))
FROM UNNEST(timeseries.channel_properties) AS channel_properties
) AS number_of_digits,
timestamp,
value,
( SELECT MAX(IF(key = 'margin', value, NULL))
FROM UNNEST(annotation), UNNEST(properties)
) AS margin
FROM my_table
LEFT JOIN UNNEST(timeseries.data)
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02'
ORDER BY flow_timestamp

Array column and aggregation in BigQuery SQL: Why the values are not all aggregated?

I've executed the code below in BigQuery
SELECT ( --inner query
SELECT STRING_AGG(c) FROM t1.array_column c
)
FROM (
select 1 as f1, ['1','2','3'] as array_column
union all
select 2 as f1, ['5','6','7'] as array_column
) t1;
I expected something like
Row|f0_
1 | 1,2,3,4,5,6,7
because there is no GROUP BY in the inner query. So, I'm expecting STRING_AGG to be evaluated on all the lines.
SELECT STRING_AGG(c) FROM t1.array_column c
Instead I'm getting something like this:
Row|f0_
1 |1,2,3
2 |5,6,7
I'm having troubles understand why I have this result
This is your query:
SELECT (SELECT STRING_AGG(c) FROM t1.array_column c
)
FROM (select 1 as f1, ['1', '2', '3'] as array_column
union all
select 2 as f1, ['5', '6', '7'] as array_column
) t1;
First, I'm surprised it works. I thought you needed unnest():
SELECT (SELECT STRING_AGG(c) FROM UNNEST(t1.array_column) c
)
What is happening? Well, this would be more obvious if you selected f1. Then you would get:
1 1,2,3
2 5,6,7
This should make it more clear. For each row in t1 (and there are two rows), your code is:
unnesting the array into rows with a column called c.
reaggregating those rows into a string (with no name)
If you want to combine the elements in the arrays, use array_concat_agg():
SELECT array_concat_agg(array_column)
FROM (select 1 as f1, ['1','2','3'] as array_column
union all
select 2 as f1, ['5','6','7'] as array_column
) t1;
If you want this represented as a string instead of an array, use array_to_string():
SELECT array_to_string(array_concat_agg(array_column), ',')
FROM (select 1 as f1, ['1','2','3'] as array_column
union all
select 2 as f1, ['5','6','7'] as array_column
) t1;
Below is for BigQuery Standard SQL
#standardSQL
SELECT STRING_AGG((SELECT STRING_AGG(c) FROM t1.array_column c))
FROM (
SELECT 1 AS f1, ['1','2','3'] AS array_column UNION ALL
SELECT 2 AS f1, ['5','6','7'] AS array_column
) t1
and produces
Row f0_
1 1,2,3,5,6,7
Note 1: you were almost there - you were just missing extra STRING_AGG that does final grouping of strings created off of respective array in each row
Note 2: because array_column is of ARRAY type it is treated as inner table referenced as t1.array_column as as such - FROM t1.array_column c is equivalent to FROM UNNEST(array_column) c - very cool hidden feature :o)

SQL Querying on tuple values

I need to write a write a SQL query that selects values from a table based on several tuples of selection criteria. It could be done using a where clause like this :
where (a = 1 and b='a') or (a=5 and b='s')
Is the best way to select:
select a, pk from x where a in (1,5)
select b, pk from x where b in ('a','s')
and join the result of the two queries using the primary key?
do you mean something(a self join) like this:
select x.a, x.pk
from x
join x x2 on x.pk=x2.pk
where x.a in (1,5)
and x2.b in ('a','s')
?
You can use join on table expression from VALUES. You can add in VALUES as much rows as you want. It will work on MSSQL:
DECLARE #x TABLE ( a INT, b CHAR(1) )
INSERT INTO #x
VALUES ( 1, 'a' ),
( 1, 'b' ),
( 1, 'c' ),
( 2, 'd' ),
( 2, 'e' ),
( 5, 'f' ),
( 5, 's' )
SELECT x.*
FROM #x x
JOIN (
VALUES ( 1, 'a'),
( 5, 's')
) AS v( a, b ) ON x.a = v.a AND x.b = v.b
Output:
a b
1 a
5 s
Based on my understanding you want write a SQL that uses a combination of two filters. Here is a simple solution that will work in any database.
Create a new column say "COLUMN_NEW" in the same table or build a temp table or a view with a new column (plus existing columns from original table).
Insert concatenated values of column a and column b in "COLUMN_NEW". Based on the example mentioned by you values in "COLUMN_NEW" will be "1a" and "5s"
Now you may have a different syntax for concat in different databases. Example concat(a,b) in SQL server.
SQL to select records from the table will be select * from table where COLUMN_NEW in ("1a",5s");

Joining a list of values with table rows in SQL

Suppose I have a list of values, such as 1, 2, 3, 4, 5 and a table where some of those values exist in some column. Here is an example:
id name
1 Alice
3 Cindy
5 Elmore
6 Felix
I want to create a SELECT statement that will include all of the values from my list as well as the information from those rows that match the values, i.e., perform a LEFT OUTER JOIN between my list and the table, so the result would be like follows:
id name
1 Alice
2 (null)
3 Cindy
4 (null)
5 Elmore
How do I do that without creating a temp table or using multiple UNION operators?
If in Microsoft SQL Server 2008 or later, then you can use Table Value Constructor
Select v.valueId, m.name
From (values (1), (2), (3), (4), (5)) v(valueId)
left Join otherTable m
on m.id = v.valueId
Postgres also has this construction VALUES Lists:
SELECT * FROM (VALUES (1, 'one'), (2, 'two'), (3, 'three')) AS t (num,letter)
Also note the possible Common Table Expression syntax which can be handy to make joins:
WITH my_values(num, str) AS (
VALUES (1, 'one'), (2, 'two'), (3, 'three')
)
SELECT num, txt FROM my_values
With Oracle it's possible, though heavier From ASK TOM:
with id_list as (
select 10 id from dual union all
select 20 id from dual union all
select 25 id from dual union all
select 70 id from dual union all
select 90 id from dual
)
select * from id_list;
the following solution for oracle is adopted from this source. the basic idea is to exploit oracle's hierarchical queries. you have to specify a maximum length of the list (100 in the sample query below).
select d.lstid
, t.name
from (
select substr(
csv
, instr(csv,',',1,lev) + 1
, instr(csv,',',1,lev+1 )-instr(csv,',',1,lev)-1
) lstid
from (select ','||'1,2,3,4,5'||',' csv from dual)
, (select level lev from dual connect by level <= 100)
where lev <= length(csv)-length(replace(csv,','))-1
) d
left join test t on ( d.lstid = t.id )
;
check out this sql fiddle to see it work.
Bit late on this, but for Oracle you could do something like this to get a table of values:
SELECT rownum + 5 /*start*/ - 1 as myval
FROM dual
CONNECT BY LEVEL <= 100 /*end*/ - 5 /*start*/ + 1
... And then join that to your table:
SELECT *
FROM
(SELECT rownum + 1 /*start*/ - 1 myval
FROM dual
CONNECT BY LEVEL <= 5 /*end*/ - 1 /*start*/ + 1) mypseudotable
left outer join myothertable
on mypseudotable.myval = myothertable.correspondingval
Assuming myTable is the name of your table, following code should work.
;with x as
(
select top (select max(id) from [myTable]) number from [master]..spt_values
),
y as
(select row_number() over (order by x.number) as id
from x)
select y.id, t.name
from y left join myTable as t
on y.id = t.id;
Caution: This is SQL Server implementation.
fiddle
For getting sequential numbers as required for part of output (This method eliminates values to type for n numbers):
declare #site as int
set #site = 1
while #site<=200
begin
insert into ##table
values (#site)
set #site=#site+1
end
Final output[post above step]:
select * from ##table
select v.id,m.name from ##table as v
left outer join [source_table] m
on m.id=v.id
Suppose your table that has values 1,2,3,4,5 is named list_of_values, and suppose the table that contain some values but has the name column as some_values, you can do:
SELECT B.id,A.name
FROM [list_of_values] AS B
LEFT JOIN [some_values] AS A
ON B.ID = A.ID