How to map multiple nested parent and child ids in BigQuery - google-bigquery

I have the data produced by the following query:
with example_data as (
select 'A' as category, 1 as child_id, 0 as parent_id, 'a' as label
union all select 'A' as category, 2 as child_id, 1 as parent_id, 'b' as label
union all select 'A' as category, 3 as child_id, 1 as parent_id, 'c' as label
union all select 'A' as category, 4 as child_id, 1 as parent_id, 'd' as label
union all select 'A' as category, 5 as child_id, 2 as parent_id, 'e' as label
union all select 'A' as category, 6 as child_id, 2 as parent_id, 'f' as label
union all select 'A' as category, 7 as child_id, 2 as parent_id, 'g' as label
union all select 'A' as category, 8 as child_id, 3 as parent_id, 'h' as label
union all select 'A' as category, 9 as child_id, 3 as parent_id, 'i' as label
union all select 'A' as category, 10 as child_id, 3 as parent_id, 'j' as label
union all select 'B' as category, 1 as child_id, 0 as parent_id, 'k' as label
union all select 'B' as category, 2 as child_id, 1 as parent_id, 'l' as label
union all select 'B' as category, 3 as child_id, 1 as parent_id, 'm' as label
union all select 'B' as category, 4 as child_id, 1 as parent_id, 'n' as label
union all select 'B' as category, 5 as child_id, 2 as parent_id, 'o' as label
union all select 'B' as category, 6 as child_id, 2 as parent_id, 'p' as label
union all select 'B' as category, 7 as child_id, 2 as parent_id, 'q' as label
union all select 'B' as category, 8 as child_id, 3 as parent_id, 'r' as label
union all select 'B' as category, 9 as child_id, 3 as parent_id, 's' as label
union all select 'B' as category, 10 as child_id, 3 as parent_id, 't' as label
)
select *
from example_data
In this table, we have some labels where each one has a parent_id. What I want to do is to get the following result. The first row can be explained as follows: label e has the label b as a parent, and label b has the label a as a parent, based on the parent ids.
category label_1 label_2 label_3
A a b e
A a b f
A a b g
A a c h
A a c i
A a c j
A a d null
B k l o
B k l p
B k l q
B k m r
B k m s
B k m t
B k n null

Consider below approach
with recursive iterations as (
select category, child_id, label as labels
from example_data where parent_id = 0
union all
select e.category, e.child_id, concat(labels, '||', label)
from iterations i join example_data e
on e.category = i.category and e.parent_id = i.child_id
)
select * except(labels) from (
select * from (
select category, labels from iterations
qualify not ifnull(starts_with(lead(labels) over(partition by category order by labels), labels || '||'), false)
), unnest(split(labels, '||')) label with offset
)
pivot (any_value(label) as label for offset + 1 in (1, 2, 3))
order by category, labels
if applied to sample data in your question - output is

I believe there must be a better way than my initial try. Hope this is helpful until somebody find it.
WITH RECURSIVE tree AS (
SELECT category, 1 AS level, parent_id, child_id, label, [label] labels
FROM example_data WHERE parent_id = 0
UNION ALL
SELECT e.category, level + 1 AS level, e.parent_id, e.child_id, e.label, ARRAY_CONCAT(labels,[e.label])
FROM tree t JOIN example_data e ON e.category = t.category AND e.parent_id = t.child_id
),
filtered AS (
SELECT * EXCEPT(labels, max_level, i), i + 1 AS level FROM (
SELECT * REPLACE(IF(ARRAY_LENGTH(labels) <> max_level, ARRAY_CONCAT(labels, ['null']), labels) AS labels) FROM (
SELECT category, labels, MAX(ARRAY_LENGTH(labels)) OVER () max_level,
FROM tree
WHERE child_id NOT IN (SELECT parent_id FROM example_data)
)
), UNNEST(labels) label WITH OFFSET i
),
pivotted AS (
SELECT *
FROM filtered
PIVOT (ARRAY_AGG(label IGNORE NULLS) AS level FOR level IN (1, 2, 3))
)
-- Unnest a pivotted result to make final output.
SELECT category, lv1 AS level_1, lv2 AS level_2, lv3 AS level_3
FROM pivotted, UNNEST(level_1) lv1 WITH OFFSET
JOIN UNNEST(level_2) lv2 WITH OFFSET USING(offset)
JOIN UNNEST(level_3) lv3 WITH OFFSET USING(offset)
ORDER BY 1, 2, 3, 4
;

Related

How to write SQL join to find description of id using Oracle?

I have 2 input tables, and I need output in string format.
I tried following query, but it does not work. How can I get the above output?
with
cte1 as --table 1
(select 1 as id , 'A' as abc from dual
union
select 2 as id , 'B' as abc from dual
union
select 3 as id , 'C' as abc from dual
union
select 4 as id , 'D' as abc from dual
union
select 5 as id , 'E' as abc from dual
union
select 6 as id , 'F' as abc from dual
),
cte2 as --table2
(select 1 as id, 3 as name from dual
union
select 1 as id, 5 as name from dual
union
select 1 as id, 4 as name from dual
union
select 2 as id, 3 as name from dual
union
select 2 as id, 6 as name from dual
)
SELECT e.id, e.abc, m.id as mgr, e.abc, c.*
FROM
cte1 e, cte2 m, cte2 c
WHERE e.id = m.id
and
e.id=c.name;
You are trying to join each row in table 1 to two rows in table 2, and the conditions can never both be true.
You want to join each row in table 2 to two rows in table 1:
SELECT e.abc, m.abc
FROM cte2 c, cte1 e, cte1 m
WHERE e.id = c.id
AND m.id = c.name
ORDER BY c.id, c.name;
A A
- -
A C
A D
A E
B C
B F
or with 'modern' join syntax, which you should really be using:
SELECT e.abc, m.abc
FROM cte2 c
JOIN cte1 e ON e.id = c.id
JOIN cte1 m ON m.id = c.name
ORDER BY c.id, c.name;
A A
- -
A C
A D
A E
B C
B F

Fetching Specific Group of Rows

I have a table with 'Name', 'Flag' and some other columns. I want to select specific group of rows from table. Data is already sorted based on another time-stamp column.
Name Flag
------ ------
A D
B D
C D
D I
E I
D D
E D
B I
D I
F I
I want to fetch 1st set of 'D' Flag and last set of 'I' flag. Is it possible in SQL (only select statement, not PL/SQL) somehow?
Desired Output:
Name Flag
------ ------
A D
B D
C D
B I
D I
F I
SQL tables represent unordered sets. So, there is no "first" or "last", unless you have a column that specifies the ordering. Note that this applies to both SQL queries and to PL/SQL code. Of course, you specify that you have two columns, so no such column exists in your data.
But let me assume that you do have one. If so, you can do:
select t.*
from t
where (t.flag = 'D' and
t.orderingcol < (select min(t2.orderingcol) from t t2 where t2.flag <> 'D'
) or
(t.flag = 'I' and
t.orderingcol > (select max(t2.orderingcol) from t t2 where t2.flag <> 'I'
)
order by t.orderingcol;
Assuming you have some sort of column that determines the ordering of the result set (e.g. the id column in my query below), this is easy enough to do with a technique known as Tabibitosan:
WITH sample_data AS (SELECT 1 ID, 'A' NAME, 'D' flag FROM dual UNION ALL
SELECT 2 ID, 'B' NAME, 'D' flag FROM dual UNION ALL
SELECT 3 ID, 'C' NAME, 'D' flag FROM dual UNION ALL
SELECT 4 ID, 'D' NAME, 'I' flag FROM dual UNION ALL
SELECT 5 ID, 'E' NAME, 'I' flag FROM dual UNION ALL
SELECT 6 ID, 'D' NAME, 'D' flag FROM dual UNION ALL
SELECT 7 ID, 'E' NAME, 'D' flag FROM dual UNION ALL
SELECT 8 ID, 'B' NAME, 'I' flag FROM dual UNION ALL
SELECT 9 ID, 'D' NAME, 'I' flag FROM dual UNION ALL
SELECT 10 ID, 'F' NAME, 'I' flag FROM dual)
SELECT ID,
NAME,
flag
FROM (SELECT ID,
NAME,
flag,
grp,
MIN(CASE WHEN flag = 'D' THEN grp END) OVER (PARTITION BY flag) min_d_grp,
MAX(CASE WHEN flag = 'I' THEN grp END) OVER (PARTITION BY flag) max_i_grp
FROM (SELECT ID,
NAME,
flag,
row_number() OVER (ORDER BY ID) - row_number() OVER (PARTITION BY flag ORDER BY ID) grp
FROM sample_data
WHERE flag IN ('D', 'I')))
WHERE (flag = 'D' AND grp = min_d_grp)
OR (flag = 'I' AND grp = max_i_grp)
ORDER BY id;
ID NAME FLAG
---------- ---- ----
1 A D
3 C D
2 B D
9 D I
8 B I
10 F I
This query uses the tabibitosan method to generate an additional "grp" column, which you can then use to find the lowest number for the D flag rows and the highest for the I flag rows.
ETA: This may or may not perform better than Gordon's answer, but I would recommend you test both answers to see which works better for your tables/indexes/data etc.

How to count consecutive duplicates in a table?

I have below question:
Want to find the consecutive duplicates
SLNO NAME PG
1 A1 NO
2 A2 YES
3 A3 NO
4 A4 YES
6 A5 YES
7 A6 YES
8 A7 YES
9 A8 YES
10 A9 YES
11 A10 NO
12 A11 YES
13 A12 NO
14 A14 NO
We will consider the value of PG column and I need the output as 6 which is the count of maximum consecutive duplicates.
It can be done with Tabibitosan method. Run this, to understand it:
with a as(
select 1 slno, 'A' pg from dual union all
select 2 slno, 'A' pg from dual union all
select 3 slno, 'B' pg from dual union all
select 4 slno, 'A' pg from dual union all
select 5 slno, 'A' pg from dual union all
select 6 slno, 'A' pg from dual
)
select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
from(
select slno,
pg,
case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
from a
);
Newgrp means a new group is found.
Result:
SLNO PG NEWGRP GRP
1 A 1 1
2 A 0 1
3 B 1 2
4 A 1 3
5 A 0 3
6 A 0 3
Now, just use a group by with count, to find the group with maximum number of occurrences:
with a as(
select 1 slno, 'A' pg from dual union all
select 2 slno, 'A' pg from dual union all
select 3 slno, 'B' pg from dual union all
select 4 slno, 'A' pg from dual union all
select 5 slno, 'A' pg from dual union all
select 6 slno, 'A' pg from dual
),
b as(
select slno, pg, newgrp, sum(newgrp) over (order by slno) grp
from(
select slno, pg, case when pg <> nvl(lag(pg) over (order by slno),1) then 1 else 0 end newgrp
from a
)
)
select max(cnt)
from (
select grp, count(*) cnt
from b
group by grp
);
with test as (
select 1 slno,'A1' name ,'NO' pg from dual union all
select 2,'A2','YES' from dual union all
select 3,'A3','NO' from dual union all
select 4,'A4','YES' from dual union all
select 6,'A5','YES' from dual union all
select 7,'A6','YES' from dual union all
select 8,'A7','YES' from dual union all
select 9,'A8','YES' from dual union all
select 10,'A9','YES' from dual union all
select 11,'A10','NO' from dual union all
select 12,'A11','YES' from dual union all
select 13,'A12','NO' from dual union all
select 14,'A14','NO' from dual),
consecutive as (select row_number() over(order by slno) rr, x.*
from test x)
select x.* from Consecutive x
left join Consecutive y on x.rr = y.rr+1 and x.pg = y.pg
where y.rr is not null
order by x.slno
And you can control output with condition in where.
where y.rr is not null query returns duplicates
where y.rr is null query returns "distinct" values.
Just for completeness, here's the actual Tabibitosan method:
with sample_data as (select 1 slno, 'A1' name, 'NO' pg from dual union all
select 2 slno, 'A2' name, 'YES' pg from dual union all
select 3 slno, 'A3' name, 'NO' pg from dual union all
select 4 slno, 'A4' name, 'YES' pg from dual union all
select 6 slno, 'A5' name, 'YES' pg from dual union all
select 7 slno, 'A6' name, 'YES' pg from dual union all
select 8 slno, 'A7' name, 'YES' pg from dual union all
select 9 slno, 'A8' name, 'YES' pg from dual union all
select 10 slno, 'A9' name, 'YES' pg from dual union all
select 11 slno, 'A10' name, 'NO' pg from dual union all
select 12 slno, 'A11' name, 'YES' pg from dual union all
select 13 slno, 'A12' name, 'NO' pg from dual union all
select 14 slno, 'A14' name, 'NO' pg from dual)
-- end of mimicking a table called "sample_data" containing your data; see SQL below:
select max(cnt) max_pg_in_queue
from (select count(*) cnt
from (select slno,
name,
pg,
row_number() over (order by slno)
- row_number() over (partition by pg
order by slno) grp
from sample_data)
where pg = 'YES'
group by grp);
MAX_PG_IN_QUEUE
---------------
6
SELECT MAX(consecutives) -- Block 1
FROM (
SELECT t1.pg, t1.slno, COUNT(*) AS consecutives -- Block 2
FROM test t1 INNER JOIN test t2 ON t1.pg = t2.pg
WHERE t1.slno <= t2.slno
AND NOT EXISTS (
SELECT * -- Block 3
FROM test t3
WHERE t3.slno > t1.slno
AND t3.slno < t2.slno
AND t3.pg != t1.pg
)
GROUP BY t1.pg, t1.slno
);
The query calculates the result in following way:
Extract all couples of records that don't have a record with different value of PG in between (blocks 2 and 3)
Group them by PG value and starting SLNO value -> this counts the consecutive values for any [PG, (starting) SLNO] couple (block 2);
Extract Maximum value from query 2 (block 1)
Note that the query may be simplified if the slno field in table contains consecutive values, but this seems not your case (in your example record with SLNO = 5 is missing)
Only requiring a single aggregation query and no joins (the rest of the calculation can be done with ROW_NUMBER, LAG and LAST_VALUE):
SELECT MAX( num_before_in_queue ) AS max_sequential_in_queue
FROM (
SELECT rn - LAST_VALUE( has_changed ) IGNORE NULL OVER ( ORDER BY ROWNUM ) + 1
AS num_before_in_queue
FROM (
SELECT pg,
ROW_NUMBER() OVER ( ORDER BY slno ) AS rn,
CASE pg WHEN LAG( pg ) OVER ( ORDER BY slno )
THEN NULL
ELSE ROW_NUMBER() OVER ( ORDER BY sl_no )
END AS change
FROM table_name
)
WHERE pg = 'Y'
);
Try to use row_number()
select
SLNO,
Name,
PG,
row_number() over (partition by PG order by PG) as 'Consecutive'
from
<table>
order by
SLNO,
NAME,
PG
This is should work with minor tweaking.
--EDIT--
Sorry, partiton by PG.
The partitioning tells the row_number when to start a new sequence.

Oracle SQL query using case when, compacting null fields

I have a table like this:
Items
id group old_new object
1 A O pen
2 A N house
3 B O dog
4 B O cat
5 C N mars
6 C O sun
7 C N moon
8 C o earth
I would like the select return:
Items
group new_object old_object
A house pen
B null dog
B null cat
C mars sun
C moon earth
If I try:
select id,
case when old_new = 'N' then object end as new_object,
case when old_new = 'O' then object end as old_object
from the_table
order by id;
I have 8 row with many field as null
es: last rows:
group new_object old_object
C mars null
c null sun
C moon null
c null earth
But of group C I want only 2 rows...
is not like the other query 'Oracle sql join same table ecc...' because here don't want null result
I'm going to make the assumption that Old and New records are paired in the order they appear based on the ID value. With that assumption the following query:
WITH DTA(ID, GRP, OLD_NEW, OBJECT) AS (
select 1, 'A', 'O', 'pen' from dual union all
select 2, 'A', 'N', 'house' from dual union all
select 3, 'B', 'O', 'dog' from dual union all
select 4, 'B', 'O', 'cat' from dual union all
select 5, 'C', 'N', 'mars' from dual union all
select 6, 'C', 'O', 'sun' from dual union all
select 7, 'C', 'N', 'moon' from dual union all
select 8, 'C', 'O', 'earth' from dual
), dta2 as (
select dta.*
, row_number() over (partition by GRP, old_new order by id) rn
from dta
)
select coalesce(n.grp, o.grp) grp
, n.object new_object
, o.object old_object
from (select * from dta2 where old_new = 'N') n
full join (select * from dta2 where old_new = 'O') o
on n.grp = o.grp
and n.rn = o.rn;
Aside from the sample data section (with dta) this script first uses the analytic function ROW_NUMBER() to add a sequential number partitioned by the group and old_new columns. It then performs a full outer join on two inline views of the dta2 subfactored query, one for thr old objects and one for the new objects. The result, at least for this data set is:
GRP NEW_OBJECT OLD_OBJECT
--- ---------- ----------
A house pen
B dog
B cat
C mars sun
C moon earth
In the first step assign an index (IDX) of the chnage withing your group. I'm using order by ID, but this is upon you. The important thing is that the old and new valuea are unique connected with GRP and IDX.
In next step let PIVOT work for you (I'm using the data from #Sentinel, thx!)
WITH DTA(ID, GRP, OLD_NEW, OBJECT) AS (
select 1, 'A', 'O', 'pen' from dual union all
select 2, 'A', 'N', 'house' from dual union all
select 3, 'B', 'O', 'dog' from dual union all
select 4, 'B', 'O', 'cat' from dual union all
select 5, 'C', 'N', 'mars' from dual union all
select 6, 'C', 'O', 'sun' from dual union all
select 7, 'C', 'N', 'moon' from dual union all
select 8, 'C', 'O', 'earth' from dual
), DTA2 as (
SELECT
ROW_NUMBER() OVER (PARTITION BY GRP,OLD_NEW order by ID) as IDX,
GRP, OLD_NEW, OBJECT
from DTA
)
select * from DTA2
PIVOT (max(OBJECT) OBJECT for (OLD_NEW) in
('N' as "NEW",
'O' as "OLD"
))
order by GRP;
result
IDX, GRP, NEW_OBJECT, OLD_OBJECT
1 A house pen
1 B dog
2 B cat
2 C moon earth
1 C mars sun
Here's an alternative using PIVOT to get the results:
with items as (select 1 id, 'A' grp, 'O' old_new, 'pen' obj from dual union all
select 2 id, 'A' grp, 'N' old_new, 'house' obj from dual union all
select 3 id, 'B' grp, 'O' old_new, 'dog' obj from dual union all
select 4 id, 'B' grp, 'O' old_new, 'cat' obj from dual union all
select 5 id, 'C' grp, 'N' old_new, 'mars' obj from dual union all
select 6 id, 'C' grp, 'O' old_new, 'sun' obj from dual union all
select 7 id, 'C' grp, 'N' old_new, 'moon' obj from dual union all
select 8 id, 'C' grp, 'O' old_new, 'earth' obj from dual)
-- end of mimicking your items table with data in it. See SQL below:
select grp,
new_object,
old_object
from (select grp,
old_new,
obj,
row_number() over (partition by grp, old_new order by id) rn
from items)
pivot (max(obj)
for old_new in ('N' new_object,
'O' old_object))
order by grp,
rn;
GRP NEW_OBJECT OLD_OBJECT
--- ---------- ----------
A house pen
B dog
B cat
C mars sun
C moon earth
Provided that
there's at most one new object for each old,
there's no new object without old object, and
there's at most one old object for any group (this is not true for your sample data, but in comments you indicate you're interested in such solution as well)
a simpler query may be used than for the general case:
select
old.group as group, new.object as new_object, old.object as old_object
from
(select group, object from my_table where old_new = 'O') old
left join
(select group, object from my_table where old_new = 'N') new
on (old.group = new.group)

Get distinct rows based on priority?

I have a table as below.i am using oracle 10g.
TableA
------
id status
---------------
1 R
1 S
1 W
2 R
i need to get distinct ids along with their status. if i query for distinct ids and their status i get all 4 rows.
but i should get only 2. one per id.
here id 1 has 3 distinct statuses. here i should get only one row based on priority.
first priority is to 'S' , second priority to 'W' and third priority to 'R'.
in my case i should get two records as below.
id status
--------------
1 S
2 R
How can i do that? Please help me.
Thanks!
select
id,
max(status) keep (dense_rank first order by instr('SWR', status)) as status
from TableA
group by id
order by 1
fiddle
select id , status from (
select TableA.*, ROW_NUMBER()
OVER (PARTITION BY TableA.id ORDER BY DECODE(
TableA.status,
'S',1,
'W',2,
'R',3,
4)) AS row_no
FROM TableA)
where row_no = 1
This is first thing i would do, but there may be a better way.
Select id, case when status=1 then 'S'
when status=2 then 'W'
when status=3 then 'R' end as status
from(
select id, max(case when status='S' then 3
when status='W' then 2
when status='R' then 1
end) status
from tableA
group by id
);
To get it done you can write a similar query:
-- sample of data from your question
SQL> with t1(id , status) as (
2 select 1, 'R' from dual union all
3 select 1, 'S' from dual union all
4 select 1, 'W' from dual union all
5 select 2, 'R' from dual
6 )
7 select id -- actual query
8 , status
9 from ( select id
10 , status
11 , row_number() over(partition by id
12 order by case
13 when upper(status) = 'S'
14 then 1
15 when upper(status) = 'W'
16 then 2
17 when upper(status) = 'R'
18 then 3
19 end
20 ) as rn
21 from t1
22 ) q
23 where q.rn = 1
24 ;
ID STATUS
---------- ------
1 S
2 R
select id,status from
(select id,status,decode(status,'S',1,'W',2,'R',3) st from table) where (id,st) in
(select id,min(st) from (select id,status,decode(status,'S',1,'W',2,'R',3) st from table))
Something like this???
SQL> with xx as(
2 select 1 id, 'R' status from dual UNION ALL
3 select 1, 'S' from dual UNION ALL
4 select 1, 'W' from dual UNION ALL
5 select 2, 'R' from dual
6 )
7 select
8 id,
9 DECODE(
10 MIN(
11 DECODE(status,'S',1,'W',2,'R',3)
12 ),
13 1,'S',2,'W',3,'R') "status"
14 from xx
15 group by id;
ID s
---------- -
1 S
2 R
Here, logic is quite simple.
Do a DECODE for setting the 'Priority', then find the MIN (i.e. one with Higher Priority) value and again DECODE it back to get its 'Status'
Using MOD() example with added values:
SELECT id, val, distinct_val
FROM
(
SELECT id, val
, ROW_NUMBER() OVER (ORDER BY id) row_seq
, MOD(ROW_NUMBER() OVER (ORDER BY id), 2) even_row
, (CASE WHEN id = MOD(ROW_NUMBER() OVER (ORDER BY id), 2) THEN NULL ELSE val END) distinct_val
FROM
(
SELECT 1 id, 'R' val FROM dual
UNION
SELECT 1 id, 'S' val FROM dual
UNION
SELECT 1 id, 'W' val FROM dual
UNION
SELECT 2 id, 'R' val FROM dual
UNION -- comment below for orig data
SELECT 3 id, 'K' val FROM dual
UNION
SELECT 4 id, 'G' val FROM dual
UNION
SELECT 1 id, 'W' val FROM dual
))
WHERE distinct_val IS NOT NULL
/
ID VAL DISTINCT_VAL
--------------------------
1 S S
2 R R
3 K K
4 G G