T-SQL get root node in hierarchy - sql

So I have two tables structured like so:
CREATE TABLE #nodes(node int NOT NULL);
ALTER TABLE #nodes ADD CONSTRAINT PK_nodes PRIMARY KEY CLUSTERED (node);
CREATE TABLE #arcs(child_node int NOT NULL, parent_node int NOT NULL);
ALTER TABLE #arcs ADD CONSTRAINT PK_arcs PRIMARY KEY CLUSTERED (child_node, parent_node);
INSERT INTO #nodes(node)
VALUES (1), (2), (3), (4), (5), (6), (7);
INSERT INTO #arcs(child_node, parent_node)
VALUES (2, 3), (3, 4), (2, 6), (6, 7);
If I have two nodes, lets say 1 and 2. I want a list of their root nodes. In this case it would be 1, 4, and 7. How can I write a query to get me that information ?
I took a stab at writing it but ran into the issue that I can't use a LEFT join in the recursive part of a CTE for some unknown reason. Here is the query that would work if I was allowed to do a LEFT JOIN.
WITH root_nodes
AS (
-- Grab all the leaf nodes I care about and their parent
SELECT n.node as child_node, a.parent_node
FROM #nodes n
LEFT JOIN #arcs a
ON n.node = a.child_node
WHERE n.node IN (1, 2)
UNION ALL
-- Grab all the parent nodes
SELECT rn.parent_node as child_node, a.parent_node
FROM root_nodes rn
LEFT JOIN #arcs a -- <-- LEFT JOINS are Illegal for some reason :(
ON rn.parent_node = a.child_node
WHERE rn.parent_node IS NOT NULL
)
SELECT DISTINCT rn.child_node as root_node
FROM root_nodes rn
WHERE rn.parent_node IS NULL
Is there a way I can restructure the query to get what I want ? I can't restructure the data and I would really prefer to stay away from temporary tables or having to do anything expensive.
Thanks,
Raul

How about moving the LEFT JOIN out of the CTE?
WITH root_nodes
AS (
-- Grab all the leaf nodes I care about
SELECT NULL as child_node, n.node as parent_node
FROM #nodes n
WHERE n.node IN (1, 2)
UNION ALL
-- Grab all the parent nodes
SELECT rn.parent_node as child_node, a.parent_node
FROM root_nodes rn
JOIN #arcs a
ON rn.parent_node = a.child_node
)
SELECT DISTINCT rn.parent_node AS root_node
FROM root_nodes rn
LEFT JOIN #arcs a
ON rn.parent_node = a.child_node
WHERE a.parent_node IS NULL
The result set is 1, 4, 7.

Related

How to select from a master table but replace certain rows using a secondary, linked table?

I have two tables with a foreign key relationship on an ID. I'll refer to them as master and secondary to make things easier and also not worry about the FK for now. Here is cut down, easy to reproduce example using table variables to represent the problem:
DECLARE #Master TABLE (
[MasterID] Uniqueidentifier NOT NULL
,[Description] NVARCHAR(50)
)
DECLARE #Secondary TABLE (
[SecondaryID] Uniqueidentifier NOT NULL
,[MasterID] Uniqueidentifier NOT NULL
,[OtherInfo] NVARCHAR(50)
)
INSERT INTO #Master ([MasterID], [Description])
VALUES ('0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3', 'Test')
,('2696ECD2-FFDB-4E26-83D0-F146ED419C9C', 'Test 2')
,('F21568F0-59C5-4950-B936-AA73DA6009B5', 'Test 3')
INSERT INTO #Secondary (SecondaryID, MasterID, Otherinfo)
VALUES ('514673A6-8B5C-429B-905F-15BD8B55CB5D','0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3','Other info')
SELECT [MasterID], [Description], NULL AS [OtherInfo] FROM #Master
UNION
SELECT S.[MasterID], M.[Description], [OtherInfo] FROM #Secondary S
JOIN #Master M ON M.MasterID = S.MasterID
With the results.....
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test NULL
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test Other info
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL
.... I would like to only return records from #Secondary if there is a duplicate MasterID, so this is my expected output:
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test Other info
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL
I tried inserting my union query into a temporary table, then using a CTE with the partition function. This kind of works but unfortunately returns the row from the #Master table rather than the #Secondary table (regardless of the order I select). See below.
DECLARE #Results TABLE (MasterID UNIQUEIDENTIFIER,[Description] NVARCHAR(50),OtherInfo NVARCHAR(50))
INSERT INTO #Results
SELECT [MasterID], [Description], NULL AS [OtherInfo] FROM #Master
UNION
SELECT S.[MasterID], M.[Description], [OtherInfo] FROM #Secondary S
JOIN #Master M ON M.MasterID = S.MasterID
;WITH CTE AS (
SELECT *, RN= ROW_NUMBER() OVER (PARTITION BY [MasterID] ORDER BY [Description] DESC) FROM #Results
)
SELECT * FROM CTE WHERE RN =1
Results:
0C1F1A0C-1DB5-4FA2-BC70-26AA9B10D5C3 Test NULL 1
F21568F0-59C5-4950-B936-AA73DA6009B5 Test 3 NULL 1
2696ECD2-FFDB-4E26-83D0-F146ED419C9C Test 2 NULL 1
Note that I am not just trying to select the rows which have a value for OtherInfo, this is just to help differentiate the two tables in the result set.
Just to reiterate, what I need to only return the rows present in #Secondary, when there is a duplicate MasterID. If #Secondary has a row for a particular MasterID, I don't need the row from #Master. I hope this makes sense.
What is the best way to do this? I am happy to redesign my database structure. I'm effectively trying to have a master list of items but sometimes take one of those and assign extra info to it + tie it to another ID. In this instance, that record replaces the master list.
You are way overcomplicating this. All you need is a left join.
SELECT M.[MasterID], M.[Description], S.[OtherInfo] FROM #Master M
LEFT JOIN #Secondary S ON M.MasterID = S.MasterID
Union seems to be the wrong approach... I would suggest a left join:
SELECT m.[MasterID], m.[Description], s.[OtherInfo]
FROM #Master m
LEFT JOIN #Secondary s ON s.MasterID = m.MasterID

HAVING clause with subquery -- Checking if group has at least one row matching conditions

Suppose I have the following table
DROP TABLE IF EXISTS #toy_example
CREATE TABLE #toy_example
(
Id int,
Pet varchar(10)
);
INSERT INTO #toy
VALUES (1, 'dog'),
(1, 'cat'),
(1, 'emu'),
(2, 'cat'),
(2, 'turtle'),
(2, 'lizard'),
(3, 'dog'),
(4, 'elephant'),
(5, 'cat'),
(5, 'emu')
and I want to fetch all Ids that have certain pets (for example either cat or emu, so Ids 1, 2 and 5).
DROP TABLE IF EXISTS #Pets
CREATE TABLE #Pets
(
Animal varchar(10)
);
INSERT INTO #Pets
VALUES ('cat'),
('emu')
SELECT Id
FROM #toy_example
GROUP BY Id
HAVING COUNT(
CASE
WHEN Pet IN (SELECT Animal FROM #Pets)
THEN 1
END
) > 0
The above gives me the error Cannot perform an aggregate function on an expression containing an aggregate or a subquery. I have two questions:
Why is this an error? If I instead hard code the subquery in the HAVING clause, i.e. WHEN Pet IN ('cat','emu') then this works. Is there a reason why SQL server (I've checked with SQL server 2017 and 2008) does not allow this?
What would be a nice way to do this? Note that the above is just a toy example. The real problem has many possible "Pets", which I do not want to hard code. It would be nice if the suggested method could check for multiple other similar conditions too in a single query.
If I followed you correctly, you can just join and aggregate:
select t.id, count(*) nb_of_matches
from #toy_example t
inner join #pets p on p.animal = t.pet
group by t.id
The inner join eliminates records from #toy_example that have no match in #pets. Then, we aggregate by id and count how many recors remain in each group.
If you want to retain records that have no match in #pets and display them with a count of 0, then you can left join instead:
select t.id, count(*) nb_of_records, count(p.animal) nb_of_matches
from #toy_example t
left join #pets p on p.animal = t.pet
group by t.id
How about this approach?
SELECT e.Id
FROM #toy_example e JOIN
#pets p
ON e.pet = p.animal
GROUP BY e.Id
HAVING COUNT(DISTINCT e.pet) = (SELECT COUNT(*) FROM #pets);

SQL get Parent where Children have specific values

Given is a Parent with the field id and Child relation with parent_id and name. How would a query look like to get all Parents which have two children, one with the name 'John' and one with the name 'Mike'. My problem is, that I am not able to build a query which returns the Parents having both children. I used Where IN ('John', 'Mike') so I get also the Parents returned which have also one child with the name 'John' or 'Mike'. But I want only the Parents with both children only.
SELECT * FROM Parent
JOIN Child ON Child.parent_id = Parent.id
WHERE Child.name IN ('John', 'Mike')
My query is of course more complex and this is only an abstraction for what I want to achieve. I have in mind, that I first need to join the children on parent_id and make something with that, but no idea.
You can do two joins and look for your specific records. This example shows that parent 1 will return with both kiddos, but not parent 2 that only has a Mike.
DECLARE #parent TABLE (ID INT)
DECLARE #child TABLE (ID INT, parentID INT, name VARCHAR(100))
INSERT INTO #parent
VALUES
(1),
(2),
(3),
(4),
(5),
(6)
INSERT INTO #child (ID, parentID, name)
VALUES
(1, 1, 'Mike'),
(2, 1, 'John'),
(3, 2, 'Mike'),
(4, 2, 'Bill'),
(5, 3, 'Dave'),
(6, 4, 'Sam')
SELECT p.*
FROM #parent p
INNER JOIN #child c1
ON c1.parentID = p.id
AND c1.name = 'Mike'
INNER JOIN #child c2
ON c2.parentID = p.ID
AND c2.name = 'John'
Try having two steps in the where clause. Both conditions will have to be true to return a parent record.
where parent.id in (select parent_id from child where child.name='John')
and parent.id in (select parent_id from child where child.name='Mike')
something like this would work in postgres if you have having.
SELECT parent_id, SUM(num) FROM (
SELECT parent_id, 1 as num FROM Child Where name = 'John'
UNION
SELECT parent_id, 1 as num FROM Child Where name = 'Mike'
) parents
GROUP BY parent_id HAVING SUM(num) = 2
So,
added the solution with the double join into an Ecto query and it passed my tests :)
from c in Child,
join: p in Parent, on: c.parent_id = p.id,
join: cc in Child, on: p.id = cc.parent_id,
where: c.name == ^"John",
where: cc.name == ^"Mike"
select: count(p.id)
Thanks for the ideas and the fast help :)

SQL - select selective row multiple times

I need to produce mailing labels for my company and I thought I would do a query for that:
I have 2 tables - tblAddress , tblContact.
In tblContact I have "addressNum" which is a foreign key of address and "labelsNum" column that represents the number of times the address should appear in the labels sheet.
I need to create an inner join of tblcontact and tbladdress by addressNum,
but if labelsNum exists more than once it should be displayed as many times as labelsNum is.
I suggest using a recursive query to do the correct number of iterations for each row.
Here is the code (+ link to SQL fiddle):
;WITH recurs AS (
SELECT *, 1 AS LEVEL
FROM tblContact
UNION ALL
SELECT t1.*, LEVEL + 1
FROM tblContact t1
INNER JOIN
recurs t2
ON t1.addressnum = t2.addressnum
AND t2.labelsnum > t2.LEVEL
)
SELECT *
FROM recurs
ORDER BY addressnum
Wouldn't the script return multiple lines for different contacts anyway?
CREATE TABLE tblAddress (
AddressID int IDENTITY
, [Address] nvarchar(35)
);
CREATE TABLE tblContact (
ContactID int IDENTITY
, Contact nvarchar(35)
, AddressNum int
, labelsNum int
);
INSERT INTO tblAddress VALUES ('foo1');
INSERT INTO tblAddress VALUES ('foo2');
INSERT INTO tblContact VALUES ('bar1', 1, 1);
INSERT INTO tblContact VALUES ('bar2', 2, 2);
INSERT INTO tblContact VALUES ('bar3', 2, 2);
SELECT * FROM tblAddress a JOIN tblContact c ON a.AddressID = c.AddressNum
This yields 3 rows on my end. The labelsNum column seems redundant to me. If you add a third contact for address foo2, you would have to update all labelsNum columns for all records referencing foo2 in order to keep things consistent.
The amount of labels is already determined by the amount of different contacts.
Or am I missing something?

How to show fields from most recently added detail in a view?

QUERY:
drop table #foot
create table #foot (
id int primary key not null,
name varchar(50) not null
)
go
drop table #note
create table #note (
id int primary key not null,
note varchar(MAX) not null,
foot_id int not null references #foot(id)
)
go
insert into #foot values
(1, 'Joe'), (2, 'Mike'), (3, 'Rob')
go
insert into #note (id, note, foot_id) values (1, 'Joe note 1', 1)
go
insert into #note (id, note, foot_id) values(2, 'Joe note 2', 1)
go
insert into #note (id, note, foot_id) values(3, 'Mike note 1', 2)
go
select F.name, N.note, N.id
from #foot F left outer join #note N on N.foot_id=F.id
RESULT:
QUESTION:
How can I create a view/query resulting in one row for each master record (#foot) along with fields from the most recently inserted detail (#note), if any?
GOAL:
(NOTE: the way I would tell which one is most recent is the id which would be higher for newer records)
select t.name, t.note, t.id
from (select F.name, N.note, N.id,
ROW_NUMBER() over(partition by F.id order by N.id desc) as RowNum
from #foot F
left outer join #note N
on N.foot_id=F.id) t
where t.RowNum = 1
Assuming the ID created in the #note table is always incremental (imposed by using IDENTITY or by controlling the inserts to always increment the by by max value) you can use the following query (which uses rank function):
WITH Dat AS
(
SELECT f.name,
n.note,
n.id,
RANK() OVER(PARTITION BY n.foot_id ORDER BY n.id DESC) rn
FROM #foot f LEFT OUTER JOIN #note n
ON n.foot_id = f.id
)
SELECT *
FROM Dat
WHERE rn = 1