Is branch pruning possible for recursive cte query - sql

This is inspired by question Retrieve a list of lists in one SQL statement - I have come up with a solution, but I have doubts on its efficiency.
To restate the problem:
we have 2 Tables: Person and Parent
Person contains basic data about each person
Parent is a join table relating person with its parents
each Person can have multiple parents
we want to receive each person data with list of all their ancestors - each ancestor in its own row
if there are no ancestors, we have only one row for that person, with null parentId
Here is the data format:
Person table
Id <PK>
Name
Parent table
Id<PK>
ParentPersonId <FK into Person >
Person has rows with values PK
1, 'Jim'
2, 'John'
3, 'Anna'
4, 'Peter'
Parent has rows with values
1, 2
1, 3
2, 3
3, 4
So person 1 has ancestors 2, 3, 4
I want the output in the following form
id name parentPersonId
--------------------------
1 Jim 2
1 Jim 3
1 Jim 4
2 John 3
2 John 4
3 Anna 4
4 Peter (null)
My solution used recursive CTE query, but my fear is that it produces too many rows, as each subtree can be entered multiple times. I needed to filter out duplicates with distinct, and execution plan shows that, even with this simple data, sorting for distinct takes 50% of the time. Here is my query:
WITH cte_org AS
(
SELECT per.id, per.name, par.parentPersonId
FROM Person per
LEFT JOIN Parent par ON per.id = par.id
UNION ALL
SELECT o.id, o.name, rec.parentPersonId
FROM Parent rec
INNER JOIN cte_org o ON o.parentPersonId = rec.id
WHERE rec.parentPersonId IS NOT NULL
)
SELECT DISTINCT *
FROM cte_org
ORDER BY id, parentPersonId;
http://sqlfiddle.com/#!18/d7d62/4
My questions:
can I somehow prune the already visited branches, so that recursive-CTE does not produce duplicate rows, and final distinct is not necessary
is recursive CTE a right approach to this problem?

On PostgreSQL you can acheive that by replacing UNION ALL with UNION.
So the query looks like that:
WITH RECURSIVE cte_org AS (
select per.id, per.name, par.parentPersonId
from Person per left join Parent par
on per.id = par.id
UNION
SELECT
o.id,
o.name,
rec.parentPersonId
FROM
Parent rec
INNER JOIN cte_org o
ON o.parentPersonId = rec.id
where rec.parentPersonId is not null
)
SELECT *
FROM cte_org
ORDER BY id, parentPersonId;
http://sqlfiddle.com/#!17/225cf4/4

Related

Find all ancestors without direct id-parentid in same table

I have a parent-child structure across two tables. The first table has BOM_ID for bills and ITEM_ID for associated children items. The other has BOM_ID and ITEM_ID for the parent item.
I am able to find the first level of parents' ITEM_ID with the following query
SELECT item_id
FROM bomversion
WHERE bom_id IN (SELECT bom_id FROM bom WHERE item_id = 1)
So, in order to find all ancestors, I would have to repeat this step. I've tried to look at CTE and recursion techniques but all examples have parentid and childid in the same table. I cannot figure this one out.
If 1 is child of 2, 2 is child of 3, 3 is child of 4, and 2 is also child of 5, I am looking for the following result:
ChildID
ParentID
1
2
2
3
2
5
3
4
The starting point will be a specific ChildID.
S O L U T I O N
Based on Adnan Sharif's proposal, I found the solution for my problem:
WITH items_CTE AS (
-- create the mapping of items to consider
SELECT
B.ITEMID AS Child_id,
BV.ITEMID AS Parent_Id
FROM BOM AS B
INNER JOIN BOMVERSION AS BV
ON B.BOMID = BV.BOMID
), parent_child_cte AS (
-- select those items as initial query
SELECT
Child_id,
Parent_id
FROM items_CTE
WHERE Child_id = '111599' -- this is the starting point
UNION ALL
-- recursive approach to find all the ancestors
SELECT
c.Child_Id,
c.Parent_Id
FROM items_CTE c
JOIN parent_child_cte pc
ON pc.Parent_Id = c.Child_id
)
SELECT * FROM parent_child_cte
From the dbfiddle that you shared in the comment, if I understand you correctly, you want to have the rows showing all the parents of a child.
For example, lets consider the hierarchy, 1 is a child of 2, 2 is a child of 3, and 3 is a child of 4. You want the final result as,
child_id
parent_id
1
2
1
3
1
4
2
3
2
4
3
4
If that's the case, we can use recursive CTE to build that table. First of all, we need to be build a relation between those two tables based on bom_id and then we need to find out parent-child relation. After that we will add rows based on the initial query. Please see the below code.
WITH RECURSIVE items_CTE AS (
-- create the mapping of items to consider
SELECT
B.Item_id AS Child_id,
BV.Item_id AS Parent_Id
FROM BOM AS B
INNER JOIN BOMVERSION AS BV
ON B.bom_id = BV.bom_id
), parent_child_cte AS (
-- select those items as initial query
SELECT
Child_id,
Parent_id
FROM items_CTE
UNION
-- recursive approach to find all the ancestors
SELECT
parent_child_cte.Child_Id,
items_CTE.Parent_Id
FROM items_CTE
INNER JOIN parent_child_cte
ON parent_child_cte.Parent_Id = items_CTE.Child_id
)
SELECT * FROM parent_child_cte

How to Limit Results Per Match on a Left Join - SQL Server

I have a table with student info [STU] and a table with parent info [PAR]. I want to return an email address for each student, but just one. So I run this query:
SELECT [STU].[ID], [PAR].[EM]
FROM (SELECT [STU].* FROM DB1.STU)
STU LEFT JOIN (SELECT [PAR].* FROM DB1.PAR) PAR ON [STU].[ID] = [PAR].[ID]
This gives me the below table:
Student ID ParentEmail
1 jim#email.com
1 sarah#email.com
2 paul#email.com
2 tim#email.com
3 bill#email.com
3 frank#email.com
3 joyce#email.com
4 greg#email.com
5 tony#email.com
5 sam#email.com
Each student has multiple parent emails, but I only want one. In other words, I want the output to look like this:
Student ID ParentEmail
1 jim#email.com
2 paul#email.com
3 frank#email.com
4 greg#email.com
5 sam#email.com
I've tried so many things. I've tried using GROUP BY and MIN/MAX and I've tried complex CASE statements, and I've tried COALESCE but I just can't seem to figure it out.
I think OUTER APPLY is the simplest method:
SELECT [STU].[ID], [PAR].[EM]
FROM DB1.STU OUTER APPLY
(SELECT TOP (1) [PAR].*
FROM DB1.PAR
WHERE [STU].[ID] = [PAR].[ID]
) PAR;
Normally, there would be an ORDER BY in the subquery, to give you control over which email you want -- the longest, shortest, oldest, or whatever. Without an ORDER BY it returns just one email, which is what you are asking for.
If you just want one column from the parent table, a simple approach is a correlated subquery:
select
s.id student_id,
(select max(p.em) from db1.par p where p.id = s.id) parent_email
from db1.stu s
This gives you the greatest parent email per student.

Update multiple row values to same row and different columns

I was trying to update table columns from another table.
In person table, there can be multiple contact persons with same inst_id.
I have a firm table, which will have latest 2 contact details from person table.
I am expecting the firm tables as below:
If there is only one contact person, update person1 and email1. If there are 2, update both. If there is 3, discard the 3rd one.
Can someone help me on this?
This should work:
;with cte (rn, id, inst_id, person_name, email) as (
select row_number() over (partition by inst_id order by id) rn, *
from person
)
update f
set
person1 = cte1.person_name,
email1 = cte1.email,
person2 = cte2.person_name,
email2 = cte2.email
from firm f
left join cte cte1 on f.inst_id = cte1.inst_id and cte1.rn = 1
left join cte cte2 on f.inst_id = cte2.inst_id and cte2.rn = 2
The common table expression (cte) used as a source for the update numbers rows in the person table, partitioned by inst_id, and then the update joins the cte twice (for top 1 and top 2).
Sample SQL Fiddle
I think you don't have to bother yourself with this update, if you rethink your database structure. One great advantage of relational databases is, that you don't need to store the same data several times in several tables, but have one single table for one kind of data (like the person's table in your case) and then reference it (by relationships or foreign keys for example).
So what does this mean for your example? I suggest, to create a institution's table where you insert two attributes like contactperson1 and contactperson2: but dont't insert all the contact details (like email and name), just the primary key of the person and make it a foreign key.
So you got a table 'Person', that should look something like this:
ID INSTITUTION_ID NAME EMAIL
1 100 abc abc#inst.com
2 101 efg efg#xym.com
3 101 ijk ijk#fg.com
4 101 rtw rtw#rtw.com
...
And a table "Institution" like:
ID CONTACTPERSON1 CONTACTPERSON2
100 1 NULL
101 2 3
...
If you now want to change the email adress, just update the person's table. You don't need to update the firm's table.
And how do you get your desired "table" with the two contact persons' details? Just make a query:
SELECT i.id, p1.name, p1.email, p2.name, p2.email
FROM institution i LEFT OUTER JOIN person p1 ON (i.contactperson1 = p1.id)
LEFT OUTER JOIN person p2 ON (i.contactperson2 = p2.id)
If you need this query often and access it like a "table" just store it as a view.

Joining 3 Tables Using Newest Rows

I have 3 tables in my database: children, families, statuslog
Every time a child is checked in or out of the database, it is updated in the statuslog. I've done this a long time ago, but I can't seem to figure out how to do it anymore. I want to create a new view that joins all 3 tables, but I only want the newest entry from statuslog (by using the highest id).
For example, statuslog looks like this:
childID researcher status id
1 Dr. A out 1
1 Dr. A in 2
1 Dr. B out 3
1 Dr. B in 4
2 Dr. C out 5
2 Dr. C in 6
3 Dr. B out 7
3 Dr. B in 8
This is what I want to do:
SELECT *
FROM children, families, statuslog
WHERE children.familyID = families.familyID AND children.childID = statuslog.childID
Obviously, this will return the children+families tuples coupled with every single log entry, but I can't remember how to only combine it with the newest log entry.
Any help would be appreciated!
Aggregate query with max(id) retrieves last ID given a childID. This is then joined to statuslog to retrieve other columns.
SELECT *
FROM children
INNER JOIN families
ON children.familyID = families.familyID
INNER JOIN
(
SELECT childID, researcher, status
FROM statuslog
INNER JOIN
(
SELECT childID, max(ID) ID
FROM statuslog
GROUP BY childID
) lastSL
ON statuslog.childID = lastSL.childid
AND statuslog.ID = lastSL.ID
) sl
ON children.childID = sl.childID
This seems to be the typical greatest-n-per-group in which the higher id is interpreted as the newest. This query should do the trick:
select * from (
select s1.* from statusLog s1
left join statusLog s2
on s1.childId = s2.childId and s1.id < s2.id
where s2.id is null
) final
join children c on c.childId = final.childId
join families f on f.familyId = c.familyId
Correct any syntactical errors.

Better way to demand, in SQL, that a column contains every specified value

Imagine you have two tables, with a one to many relationship.
For this example, I will suggest that there are two tables: Person, and Homes.
The person table holds a persons name, and gives them an ID. The homes table, holds the association of homes to a person. PID joins to "Person.ID"
And, in this tiny DB, a person can have no homes, or many homes.
I hope I drew that right.
How do I write a select, that returns everyone with every specified house type?
Let's say these are valid "Types" in the homes table:
Cottage, Main, Mansion, Spaceport.
I want to return everyone, in the Person table, who has a spaceport and a Cottage.
The best I could come up with was this:
SELECT DISTINCT( p.name ) AS name
FROM person p
INNER JOIN homes h ON h.pid = p.id
WHERE 'spaceport' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
AND 'cottage' in (
SELECT DISTINCT( type ) AS type
FROM homes
WHERE pid = p.id
)
When I wrote that, it works, but I'm pretty sure there has to be a better way.
The HAVING clause here will guarantee that the persons returned have both types, not just one or the other.
SELECT p.name
FROM person p
INNER JOIN homes h
ON p.id = h.pid
AND h.type IN ('spaceport', 'cottage')
GROUP BY p.name
HAVING COUNT(DISTINCT h.type) = 2
select * from homes;
home_id person_id type
--
1 1 cottage
2 1 mansion
3 2 cottage
4 3 mansion
5 4 cottage
6 4 cottage
To find the id numbers of every person who has both a cottage and a mansion, group by the id number, restrict the output to cottages and mansions, and count the distinct types.
select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2;
person_id
--
1
You can use this query in a join to get all the columns from the person table.
select person.*
from person
inner join (select person_id
from homes
where type in ('cottage','mansion')
group by person_id
having count(distinct type) = 2) T
on person.person_id = T.person_id;
Thanks to Joe for pointing out an error in my count().
Not sure about the performance on this one, but here goes:
SELECT PID FROM (
SELECT PID, COUNT(PID) cnt FROM (
SELECT DISTINCT PID, Type FROM Homes
WHERE Type IN ('Type1', 'Type2', 'Type3')
) a
GROUP BY PID
) b
WHERE b.cnt = 3
You'd have to dynamically generate your IN clause as well as the WHERE b.CNT clause.