BigQuery - Recursive query to get leaf rows - google-bigquery

I have a table in BigQuery which contains self referencing data like -
id | name | parent_id
----------------------------
1 | Gross Margin | null
2 | Revenue | 1
3 | Sales A | 2
4 | Sales B | 2
5 | 1001 | 3
6 | 1002 | 4
7 | OPEX | null
8 | Salaries | 7
9 | Payroll | 8
10 | Allowances | 9
11 | Commissions | 9
I want to write a query that returns the leaf rows of any row. For example if I give Gross Margin (or 1) as input, the query should return 1001 and 1002 (or 5 and 6) as output. Similarly if I give OPEX (or 7) as input, the query should return Allowances and Commissions (or 10 and 11) as output.

Updated script with improved convergence logic
DECLARE run_away_stop INT64 DEFAULT 0;
DECLARE flag BYTES;
CREATE TEMP TABLE ttt AS
SELECT parent_id, ARRAY_AGG(id order by id) children FROM `project.dataset.table` WHERE NOT parent_id IS NULL GROUP BY parent_id;
LOOP
SET (run_away_stop, flag) = (SELECT AS STRUCT run_away_stop + 1, md5(to_json_string(array_agg(t order by parent_id))) FROM ttt t);
CREATE OR REPLACE TEMP TABLE ttt1 AS
SELECT parent_id, ARRAY(SELECT DISTINCT id FROM UNNEST(children) id order by id) children
FROM (
SELECT parent_id, ARRAY_CONCAT_AGG(children) children
FROM (
SELECT t2.parent_id, ARRAY_CONCAT(t1.children, t2.children) children
FROM ttt t1, ttt t2
WHERE (SELECT COUNTIF(t1.parent_id = id) FROM UNNEST(t2.children) id) > 0
) GROUP BY parent_id
);
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT * FROM ttt1 UNION ALL
SELECT * FROM ttt WHERE NOT parent_id IN (SELECT parent_id FROM ttt1);
IF (flag = (SELECT md5(to_json_string(array_agg(t order by parent_id))) FROM ttt t)) OR run_away_stop > 20 THEN BREAK; END IF;
END LOOP;
CREATE OR REPLACE TEMP TABLE ttt AS
SELECT id,
(
SELECT STRING_AGG(CAST(id AS STRING) order by id)
FROM ttt.children id
) children_as_list
FROM (SELECT DISTINCT id FROM `project.dataset.table`) d
LEFT JOIN ttt ON id = parent_id;
SELECT t1.id, STRING_AGG(child) leafs_list
FROM ttt t1, UNNEST(SPLIT(children_as_list)) child
JOIN (SELECT id FROM ttt WHERE children_as_list IS NULL) t2
ON t2.id = CAST(child AS INT64)
GROUP BY t1.id
ORDER BY id;
When applied to sample data in your question - it took 3 iterations and output is
Also, when applied to bigger example from your comments - it took 5 iterations and output is

BigQuery Team just introduced Recursive CTE! Hooray!!
With recursive cte you can use below approach
with recursive iterations as (
select id, parent_id from your_table
where not id in (
select parent_id from your_table
where not parent_id is null
)
union all
select b.id, a.parent_id
from your_table a join iterations b
on b.parent_id = a.id
)
select parent_id, string_agg('' || id order by id) as leafs_list
from iterations
where not parent_id is null
group by parent_id
If applied to sample data in your question - output is
Hope you agree it is more manageable and effective then when we were "forced" to use scripts for such logic!

Related

Count of Table1_IDs in Table2_arrays

I'm working with two tables:
CREATE TABLE Table1
(
id int,
name varchar
)
CREATE TABLE Table2
(
id int,
name varchar,
link array<int>
)
Table2.link contains values that correspond to Table1.id. I'd like to count how many times each Table1.id appears in an instance of Table2.link. This would be trivial using cell references in Excel, but I can't figure out how to do it with a SQL query.
Presto
select *
from (select l.id
,count(*) as cnt
from Table2 cross join unnest (link) as l(id)
group by l.id
) t2
where t2.id in (select id from Table1)
order by id
presto:default> select *
-> from (select l.id
-> ,count(*) as cnt
-> from Table2 cross join unnest (link) as l(id)
-> group by l.id
-> ) t2
-> where t2.id in (select id from Table1)
-> order by id;
id | cnt
----+-----
1 | 7
2 | 5
3 | 4
(3 rows)
PostgreSQL demo
create table Table1 (id int);
create table Table2 (arr int[]);
insert into Table1 values
(1),(2),(3)
;
insert into Table2 values
(array[1,5]),(array[1,3]),(array[1,2,3]),(array[2,3])
,(array[1,2,4]),(array[1,2]),(array[1,3,5]),(array[1,2,4])
;
select *
from (select unnest(arr) as id
,count(*) as cnt
from Table2
group by id
) t2
where t2.id in (select id from Table1)
order by id
+----+-----+
| id | cnt |
+----+-----+
| 1 | 7 |
+----+-----+
| 2 | 5 |
+----+-----+
| 3 | 4 |
+----+-----+

Using Case in a select statement

Consider the following table
create table temp (id int, attribute varchar(25), value varchar(25))
And values into the table
insert into temp select 100, 'First', 234
insert into temp select 100, 'Second', 512
insert into temp select 100, 'Third', 320
insert into temp select 101, 'Second', 512
insert into temp select 101, 'Third', 320
I have to deduce a column EndResult which is dependent on 'attribute' column. For each id, I have to parse through attribute values in the order
First, Second, Third and choose the very 1st value which is available i.e. for id = 100, EndResult should be 234 for the 1st three records.
Expected result:
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 234 |
| 100 | 234 |
| 101 | 512 |
| 101 | 512 |
I tried with the following query in vain:
select id, case when isnull(attribute,'') = 'First'
then value
when isnull(attribute,'') = 'Second'
then value
when isnull(attribute,'') = 'Third'
then value
else '' end as EndResult
from
temp
Result
| id | EndResult |
|-----|-----------|
| 100 | 234 |
| 100 | 512 |
| 100 | 320 |
| 101 | 512 |
| 101 | 320 |
Please suggest if there's a way to get the expected result.
You can use analytical function like dense_rank to generate a numbering, and then select those rows that have the number '1':
select
x.id,
x.attribute,
x.value
from
(select
t.id,
t.attribute,
t.value,
dense_rank() over (partition by t.id order by t.attribute) as priority
from
Temp t) x
where
x.priority = 1
In your case, you can conveniently order by t.attribute, since their alphabetical order happens to be the right order. In other situations you could convert the attribute to a number using a case, like:
order by
case t.attribute
when 'One' then 1
when 'Two' then 2
when 'Three' then 3
end
In case the attribute column have different values which are not in alphabetical order as is the case above you can write as:
with cte as
(
select id,
attribute,
value,
case attribute when 'First' then 1
when 'Second' then 2
when 'Third' then 3 end as seq_no
from temp
)
, cte2 as
(
select id,
attribute,
value,
row_number() over ( partition by id order by seq_no asc) as rownum
from cte
)
select T.id,C.value as EndResult
from temp T
join cte2 C on T.id = C.id and C.rownum = 1
DEMO
Here is how you can achieve this using ROW_NUMBER():
WITH t
AS (
SELECT *
,ROW_NUMBER() OVER (
PARTITION BY id ORDER BY (CASE attribute WHEN 'First' THEN 1
WHEN 'Second' THEN 2
WHEN 'Third' THEN 3
ELSE 0 END)
) rownum
FROM TEMP
)
SELECT id
,(
SELECT value
FROM t t1
WHERE t1.id = t.id
AND rownum = 1
) end_result
FROM t;
For testing purpose, please see SQL Fiddle demo here:
SQL Fiddle Example
keep it simple
;with cte as
(
select row_number() over (partition by id order by (select 1)) row_num, id, value
from temp
)
select t1.id, t2.value
from temp t1
left join cte t2
on t1.Id = t2.id
where t2.row_num = 1
Result
id value
100 234
100 234
100 234
101 512
101 512

How to comapre two columns of a table in sql?

In a table there are two columns:
-----------
| A | B |
-----------
| 1 | 5 |
| 2 | 1 |
| 3 | 2 |
| 4 | 1 |
-----------
Want a table where if A=B then
-------------------
|Match | notMatch|
-------------------
| 1 | 5 |
| 2 | 3 |
| Null | 4 |
-------------------
How can i do this?
I tried something which shows the Matched part
select distinct C.A as A from Table c inner join Table d on c.A=d.B
Try this:
;WITH TempTable(A, B) AS(
SELECT 1, 5 UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 2 UNION ALL
SELECT 4, 1
)
,CTE(Val) AS(
SELECT A FROM TempTable UNION ALL
SELECT B FROM TempTable
)
,Match AS(
SELECT
Rn = ROW_NUMBER() OVER(ORDER BY Val),
Val
FROM CTE c
GROUP BY Val
HAVING COUNT(Val) > 1
)
,NotMatch AS(
SELECT
Rn = ROW_NUMBER() OVER(ORDER BY Val),
Val
FROM CTE c
GROUP BY Val
HAVING COUNT(Val) = 1
)
SELECT
Match = m.Val,
NotMatch= n.Val
FROM Match m
FULL JOIN NotMatch n
ON n.Rn = m.Rn
Try with EXCEPT, MINUS and INTERSECT Statements.
like this:
SELECT A FROM TABLE1 INTERSECT SELECT B FROM TABLE1;
You might want this:
SELECT DISTINCT
C.A as A
FROM
Table c
LEFT OUTER JOIN
Table d
ON
c.A=d.B
WHERE
d.ID IS NULL
Please Note that I use d.ID as an example because I don't see your schema. An alternate is to explicitly state all d.columns IS NULL in WHERE clause.
Your requirement is kind of - let's call it - interesting. Here is a way to solve it using pivot. Personally I would have chosen a different table structure and another way to select data:
Test data:
DECLARE #t table(A TINYINT, B TINYINT)
INSERT #t values
(1,5),(2,1),
(3,2),(4,1)
Query:
;WITH B AS
(
( SELECT A FROM #t
EXCEPT
SELECT B FROM #t)
UNION ALL
( SELECT B FROM #t
EXCEPT
SELECT A FROM #t)
), A AS
(
SELECT A val
FROM #t
INTERSECT
SELECT B
FROM #t
), combine as
(
SELECT val, 'A' col, row_number() over (order by (select 1)) rn FROM A
UNION ALL
SELECT A, 'B' col, row_number() over (order by (select 1)) rn
FROM B
)
SELECT [A], [B]
FROM combine
PIVOT (MAX(val) FOR [col] IN ([A], [B])) AS pvt
Result:
A B
1 3
2 4
NULL 5

tSQL UNPIVOT of comma concatenated column into multiple rows

I have a table that has a value column. The value could be one value or it could be multiple values separated with a comma:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1,5
2 | 859 | Cust_B_1 | 2
I need to unpivot the data based on the item_value to look like this:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1
1 | 859 | Cust_A_1 | 5
2 | 859 | Cust_B_1 | 2
How does one do that in tSQL on SQL Server 2012?
We have a user defined function that we use for stuff like this that we called "split_delimiter":
CREATE FUNCTION [dbo].[split_delimiter](#delimited_string VARCHAR(8000), #delimiter_type CHAR(1))
RETURNS TABLE AS
RETURN
WITH cte10(num) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,cte100(num) AS
(
SELECT 1
FROM cte10 t1, cte10 t2
)
,cte10000(num) AS
(
SELECT 1
FROM cte100 t1, cte100 t2
)
,cte1(num) AS
(
SELECT TOP (ISNULL(DATALENGTH(#delimited_string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM cte10000
)
,cte2(num) AS
(
SELECT 1
UNION ALL
SELECT t.num+1
FROM cte1 t
WHERE SUBSTRING(#delimited_string,t.num,1) = #delimiter_type
)
,cte3(num,[len]) AS
(
SELECT t.num
,ISNULL(NULLIF(CHARINDEX(#delimiter_type,#delimited_string,t.num),0)-t.num,8000)
FROM cte2 t
)
SELECT delimited_item_num = ROW_NUMBER() OVER(ORDER BY t.num)
,delimited_value = SUBSTRING(#delimited_string, t.num, t.[len])
FROM cte3 t;
GO
It will take a varchar value up to 8000 characters and will return a table with the delimited elements broken into rows. In your example, you'll want to use an outer apply to turn those delimited values into separate rows:
SELECT my_table.id, my_table.assess_id, question_key, my_table.delimited_items.item_value
FROM my_table
OUTER APPLY(
SELECT delimited_value AS item_value
FROM my_database.dbo.split_delimiter(my_table.item_value, ',')
) AS delimited_items

Insert data into temp table from 2 source tables

I have 2 SELECT statements that both return 13 rows from dirrefernt tables
I would like to create 1 temporary table with 2 columns and insert the 2 result rows into the 2 columns. Is there a way to do this?
So
1 - SELECT INPOS FROM TABLE1 returns
1,2,3,4,5,6,7,18,9,10,11,12,13
2 - SELECT CODE FROM TABLE2 returns
CODEA,CODEB,CODEC,CODED,CODEE,CODEF,CODEG,CODEH,CODEI,CODEJ,CODEK,CODEL,CODEM
I would like my temporary table to be
1 | CODEA
2 | CODEB
3 | CODEC
4 | CODED
5 | CODEE
6 | CODEF
7 | CODEG
8 | CODEH
9 | CODEI
10 | CODEJ
11 | CODEK
12 | CODEL
13 | CODEM
Try this:
WITH T1 AS (
SELECT ROW_NUMBER() OVER(ORDER BY INPOS) ID, INPOS FROM TABLE1
),
WITH T2 AS
(
SELECT ROW_NUMBER() OVER(ORDER BY CODE) ID, CODE FROM TABLE2
),
SELECT T1.INPOS, T2.CODE
FROM T1 INNER JOIN T2 ON T1.ID = T2.ID
Try something like this:
SELECT a.impos, b.code
FROM (
(
SELECT impos, RANK() OVER (ORDER BY impos ASC) AS link
FROM table1
) AS a INNER JOIN (
SELECT code, RANK() OVER (ORDER BY code ASC) AS link
FROM table2
) AS b ON a.link = b.link
)
sqlfiddle demo