Oracle: Recursively self referential join with nth level record - sql

I have self referential table like this:
id |level | parent_id
----------------------
1 |1 |null
2 |1 |null
3 |2 |1
4 |2 |1
5 |2 |2
6 |3 |5
7 |3 |3
8 |4 |7
9 |4 |6
------------------------
I need nth level parent in result. for example 2nd level parent
id |level | parent_id| second_level_parent_id
------------------------------------------------
1 |1 |null |null
2 |1 |null |null
3 |2 |1 |null
4 |2 |1 |null
5 |2 |2 |null
6 |3 |5 |5
7 |3 |3 |3
8 |4 |7 |3
9 |4 |6 |5
-------------------------------------------------

this works for me.
SELECT m.*,
CONNECT_BY_ROOT id AS second_level_parent_id
FROM my_table m
WHERE CONNECT_BY_ROOT level =2
CONNECT BY prior id = parent_id;
thanks #Jozef DĂșc

Related

Postgres - How to achieve UNION behaviour with UNION ALL?

I have a table with parent and child ids.
create table if not exists stack (
parent int,
child int
)
Each parent can have multiple children and each child can have multiple children again.
insert into stack (parent, child) values
(1,2),
(2,3),
(3,4),
(4,5),
(5,6),
(6,7),
(7,8),
(8,9),
(9,null),
(1,7),
(7,8),
(8,9),
(9,null);
The data looks like this.
|parent|child|
|------|-----|
|1 |2 |
|2 |3 |
|3 |4 |
|4 |5 |
|5 |6 |
|6 |7 |
|7 |8 |
|8 |9 |
|9 |NULL |
|1 |7 |
|7 |8 |
|8 |9 |
|9 |NULL |
I'd like to find all children. I can use a recursive cte with a UNION ALL.
with recursive cte as (
select
child
from
stack
where
stack.parent = 1
union
select
stack.child
from
cte
left join stack on
cte.child = stack.parent
where
cte.child is not null
)
select * from cte;
This gives me the result I'd like to achieve.
|child|
|-----|
|2 |
|7 |
|3 |
|8 |
|4 |
|9 |
|5 |
|NULL |
|6 |
However I'd like to include the depth / level and also the path for each node. I can do this using a different recursive cte.
with recursive cte as (
select
parent,
child,
0 as level,
array[parent,
child] as path
from
stack
where
stack.parent = 1
union all
select
stack.parent,
stack.child,
cte.level + 1,
cte.path || stack.child
from
cte
left join stack on
cte.child = stack.parent
where
cte.child is not null
)
select * from cte;
That gives me this data.
|parent|child|level|path |
|------|-----|-----|--------------------|
|1 |2 |0 |{1,2} |
|1 |7 |0 |{1,7} |
|2 |3 |1 |{1,2,3} |
|7 |8 |1 |{1,7,8} |
|7 |8 |1 |{1,7,8} |
|3 |4 |2 |{1,2,3,4} |
|8 |9 |2 |{1,7,8,9} |
|8 |9 |2 |{1,7,8,9} |
|8 |9 |2 |{1,7,8,9} |
|8 |9 |2 |{1,7,8,9} |
|4 |5 |3 |{1,2,3,4,5} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|9 | |3 |{1,7,8,9,} |
|5 |6 |4 |{1,2,3,4,5,6} |
|6 |7 |5 |{1,2,3,4,5,6,7} |
|7 |8 |6 |{1,2,3,4,5,6,7,8} |
|7 |8 |6 |{1,2,3,4,5,6,7,8} |
|8 |9 |7 |{1,2,3,4,5,6,7,8,9} |
|8 |9 |7 |{1,2,3,4,5,6,7,8,9} |
|8 |9 |7 |{1,2,3,4,5,6,7,8,9} |
|8 |9 |7 |{1,2,3,4,5,6,7,8,9} |
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
|9 | |8 |{1,2,3,4,5,6,7,8,9,}|
My problem is that I have a lot of duplicate data. I'd like to get the same result as the UNION query but with the level and the path.
I tried something like
where
cte.child is not null
and stack.parent not in (cte.parent)
or
where
cte.child is not null
and not exists (select parent from cte where cte.parent = stack.parent)
but the first does not change anything and the second returns an error.
ERROR: recursive reference to query "cte" must not appear within a subquery
Any ideas? Thank you very much!
Your problem is inappropriate table data. Your table contains the information that 8 is a direct child to 7 twice for instance. I suggest you remove the duplicate data and implement a unique constraint on the pairs.
If you cannot do so for some reason, make the rows distinct in your query:
with recursive
good_stack as (select distinct * from stack)
,cte as
(
select
parent,
child,
0 as level,
array[parent,
child] as path
from good_stack
where good_stack.parent = 1
union all
select
good_stack.parent,
good_stack.child,
cte.level + 1,
cte.path || good_stack.child
from cte
left join good_stack on cte.child = good_stack.parent
where cte.child is not null and good_stack.child is not null
)
select * from cte;
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=acb1d7a1a1d26c3fd9caf0e7dedc12b2
(You may also make the columns not nullable. The entries 9|null add no information. If the table were lacking these entries, 9 would still be without a child.)

Filling gaps with next not null value

I've been trying to find a solution to this since some days ago. I have the following dataset.
|id|order|certain_event|order_of_occurrence|
|--|-----|-------------|-------------------|
|a |1 |NULL |NULL |
|a |2 |NULL |NULL |
|a |3 |NULL |NULL |
|a |4 |NULL |NULL |
|a |5 |4 |1 |
|a |6 |NULL |NULL |
|a |7 |NULL |NULL |
|a |8 |4 |2 |
|a |9 |NULL |NULL |
The desired output consists in replacing the null values from the order_of_occurrence column with the next non-null value. Like this:
|id|order|certain_event|order_of_occurrence|
|--|-----|-------------|-------------------|
|a |1 |NULL |1 |
|a |2 |NULL |1 |
|a |3 |NULL |1 |
|a |4 |NULL |1 |
|a |5 |4 |1 |
|a |6 |NULL |2 |
|a |7 |NULL |2 |
|a |8 |4 |2 |
|a |9 |NULL |NULL |
I've tried using a subquery for retrieving the non-null values from the order of occurrence column, but I get more than one value returned. Like the following:
SELECT a.*,
CASE
WHEN a.order_of_occurrence IS NOT NULL THEN a.order_of_occurence
WHEN a.order_of_occurence IS NULL THEN (SELECT B.ORDER_OF_OCCURENCE FROM dataset AS B
WHERE B.ORDER_OF_OCCURRENCE IS NOT NULL)
END AS corrected_order
FROM dataset AS a
Thanks!
This is a simple task for the IGNORE NULLS option in FIRST/LAST_VALUE:
last_value(order_of_occurrence IGNORE NULLS)
over (partition by id
order by "order" DESC
rows unbounded preceding)

How to get week number using year and day of year using pyspark?

I am trying to add row numbers to a table. I need to add 1 for the first 7 rows in the dataframe and then 2 for the second 7 rows in the dataframe and so on. for eg pls refer to the last column in the dataframe.
I am basically trying to get week number based on day of the year and year
+-----------+---------------+----------------+------------------+---------+
|datekey |datecalendarday|datecalendaryear|weeknumberofseason|indicator| weeknumber
+-----------+---------------+----------------+------------------+---------+
|4965 |1 |2018 |2 |1 | 1
|4966 |2 |2018 |2 |2 | 1
|4967 |3 |2018 |2 |3 | 1
|4968 |4 |2018 |2 |4 | 1
|4969 |5 |2018 |2 |5 | 1
|4970 |6 |2018 |2 |6 | 1
|4971 |7 |2018 |3 |7 | 1
|4972 |8 |2018 |3 |8 | 2
|4973 |9 |2018 |3 |9 | 2
|4974 |10 |2018 |3 |10 | 2
|4975 |11 |2018 |3 |11 | 2
|4976 |12 |2018 |3 |12 | 2
|4977 |13 |2018 |3 |13 | 2
|4978 |14 |2018 |4 |14 | 2
I stumbled upon a solution where i use ntile function to get the number of week from the days available in that year. Any other effecient solution also would help. Thaks in advance

How to select rows based on exact count of array elements in a different column

Suppose I have a dataframe like this, where B_C is concat of col B and col C, and column selected_B_C is an array formed by picking a few B_C col from within the group.
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
|A |grp_count_A|B |C |B_C |D |selected_B_C |
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
|1 |6 |30261.41|20091201|30261.41_20091201|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |30261.41|20081201|30261.41_20081201|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |39879.85|20080601|39879.85_20080601|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |69804.42|20080117|69804.42_20080117|99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |99950.3 |20090301|99950.3_20090301 |99945.83|[30261.41_20091201, 39879.85_20080601]|
|1 |6 |99999.23|20080118|99999.23_20080118|99945.83|[30261.41_20091201, 39879.85_20080601]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|
|2 |4 |351378.0|20180620|351378.0_20180620|183600.0|[[76498.0_20150501, 76498.0_20150501]]|
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+
I want to append a column selected where it takes a value 1, if for a row, col B_C is found in colselected_B_C, otherwise 0, so the final dataframe looks like this.
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
|A |grp_count_A|B |C |B_C |D |selected_B_C |selected|
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
|1 |6 |30261.41|20081201|30261.41_20081201|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |30261.41|20091201|30261.41_20091201|99945.83|[30261.41_20091201, 39879.85_20080601]|1 |
|1 |6 |39879.85|20080601|39879.85_20080601|99945.83|[30261.41_20091201, 39879.85_20080601]|1 |
|1 |6 |69804.42|20080117|69804.42_20080117|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |99950.3 |20090301|99950.3_20090301 |99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|1 |6 |99999.23|20080118|99999.23_20080118|99945.83|[30261.41_20091201, 39879.85_20080601]|0 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|1 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|1 |
|2 |4 |76498.0 |20150501|76498.0_20150501 |183600.0|[[76498.0_20150501, 76498.0_20150501]]|0 |
|2 |4 |351378.0|20180620|351378.0_20180620|183600.0|[[76498.0_20150501, 76498.0_20150501]]|0 |
+-----------+-----------+--------+--------+-----------------+--------+--------------------------------------+--------+
The tricky part for col selected is that I only want the exact number of occurrences of a value in selected_B_C to have value 1 for selected
For example in group 2, even though there are 3 records with value of 76498.0_20150501 for col B_C, I want only two records from group 2 whose value is 76498.0_20150501 to have value of 1 for selected, as selected_B_C for group 2 has exactly 2 elements with value 76498.0_20150501 in col selected_B_C

SQL: triple-nested many to many query

I'm trying to fix my nested query, I have these tables:
cdu_groups_blocks
------------------------
|id |group_id |block_id|
------------------------
|1 |1 |1 |
|2 |1 |2 |
|3 |1 |3 |
------------------------
cdu_blocks: cdu_blocks_sessions:
-------------------------- ---------------------------
|id |name |enabled | |id |block_id |session_id |
-------------------------- ---------------------------
|1 |block_1 |1 | |1 |1 |1 |
|2 |block_2 |1 | |2 |1 |2 |
|3 |block_3 |1 | |3 |2 |3 |
-------------------------- |4 |2 |4 |
|5 |3 |5 |
|6 |3 |6 |
---------------------------
cdu_sessions: cdu_sessions_lessons
-------------------------- ----------------------------
|id |name |enabled | |id |session_id |lesson_id |
-------------------------- ----------------------------
|1 |session_1 |1 | |1 |1 |1 |
|2 |session_2 |1 | |2 |1 |2 |
|3 |session_3 |1 | |3 |2 |3 |
|4 |session_4 |0 | |4 |4 |4 |
|5 |session_5 |1 | |5 |4 |5 |
|6 |session_6 |0 | |6 |5 |6 |
-------------------------- ----------------------------
cdu_lessons:
--------------------------
|id |name |enabled |
--------------------------
|1 |lesson_1 |1 |
|2 |lesson_2 |1 |
|3 |lesson_3 |1 |
|4 |lesson_4 |1 |
|5 |lesson_5 |0 |
|6 |lesson_6 |0 |
--------------------------
It's a many-to-many which links to another many-to-many which links to another many-to-many.
Essentially I want to get all lesson_id(s) associated with a particular group_id.
So far I have this, but it's throwing up various SQL errors:
SELECT b.* FROM
(
SELECT block_id, group_id FROM cdu_groups_blocks
JOIN cdu_blocks ON cdu_blocks.id = cdu_groups_blocks.block_id
WHERE group_id = $group_id
AND enabled = 1
) AS b
INNER JOIN
(
SELECT l.* FROM
(
SELECT session_id, block_id FROM cdu_blocks_sessions
JOIN cdu_sessions ON cdu_sessions.id = cdu_blocks_sessions.session_id
AND enabled = 1
) AS s
INNER JOIN
(
SELECT lesson_id, session_id FROM cdu_sessions_lessons
JOIN cdu_lessons ON cdu_lessons.id = cdu_sessions_lessons.lesson_id
WHERE enabled = 1
) AS l
WHERE s.session_id = l.session_id
) AS sl
WHERE sl.block_id = g.block_id
Any help would be much appreciated!
sl.block_id is from s table in your first select inside sl subselect.
Just get it. Change:
SELECT l.* FROM ...
to
SELECT l.*, s.block_id FROM ...