Azure SQL: Delete User records with redundant nested references - sql

I have a quite complex requirement to delete UserSubscriptions records with redundant options, meaning if a UserId is associated with two or more subscriptions with same options I need to keep the first subscription and then delete the rest of the subscriptions, below are some of the scenarions.
Scenario 1: UserId 1 has three subscriptions (SubscriptionId 1, 2 and 3) and all of the subscriptions have same options (Email, Call, Fax) so for UserId 1 so the SubscriptionOptionIds 4,5,6,7,8,9 have to be deleted from UserSubscriptions
Scenario 2: UserId 2 has two subscriptions (SubscriptionId 1 and 2) and both the subscriptions DOES NOT have same options in this case nothing needs to be deleted
Scenarion 3: UserId 3 has two subscriptions (SubscriptionId 1 and 2) and both the subscriptions have same option (Email) so for UserId 3 so the SubscriptionOptionId 3 has to be deleted from UserSubscriptions
Below is my Table's DDL+DML
CREATE TABLE [Options](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [Options] ADD CONSTRAINT [PK_Options] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE TABLE [Subscriptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [Subscriptions] ADD CONSTRAINT [PK_Subscriptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE TABLE [SubscriptionsOptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[OptionId] [int] NOT NULL,
[SubscriptionId] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [SubscriptionsOptions] ADD CONSTRAINT [PK_SubscriptionsOptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
ALTER TABLE [SubscriptionsOptions] WITH CHECK ADD CONSTRAINT [FK_SubscriptionsOptions.SubscriptionId_Subscriptions.Id] FOREIGN KEY([SubscriptionId])
REFERENCES [Subscriptions] ([Id])
GO
ALTER TABLE [SubscriptionsOptions] CHECK CONSTRAINT [FK_SubscriptionsOptions.SubscriptionId_Subscriptions.Id]
GO
ALTER TABLE [SubscriptionsOptions] WITH CHECK ADD CONSTRAINT [FK_SubscriptionsOptions.OptionId_Options.Id] FOREIGN KEY([OptionId])
REFERENCES [Options] ([Id])
GO
ALTER TABLE [SubscriptionsOptions] CHECK CONSTRAINT [FK_SubscriptionsOptions.OptionId_Options.Id]
GO
CREATE TABLE [UserSubscriptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[SubscriptionsOptionsId] [int] NOT NULL,
[userid] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [UserSubscriptions] ADD CONSTRAINT [PK_UserSubscriptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
ALTER TABLE [UserSubscriptions] WITH CHECK ADD CONSTRAINT [FK_UserSubscriptions.SubscriptionsOptionsId_SubscriptionsOptions.Id] FOREIGN KEY([SubscriptionsOptionsId])
REFERENCES [SubscriptionsOptions] ([Id])
GO
ALTER TABLE [UserSubscriptions] CHECK CONSTRAINT [FK_UserSubscriptions.SubscriptionsOptionsId_SubscriptionsOptions.Id]
GO
INSERT INTO Options VALUES ('E-mail');
INSERT INTO Options VALUES ('Call');
INSERT INTO Options VALUES ('Fax');
INSERT INTO Subscriptions VALUES ('Promo1');
INSERT INTO Subscriptions VALUES ('Promo2');
INSERT INTO Subscriptions VALUES ('Promo3');
INSERT INTO SubscriptionsOptions VALUES (1,1);
INSERT INTO SubscriptionsOptions VALUES (1,2);
INSERT INTO SubscriptionsOptions VALUES (1,3);
INSERT INTO SubscriptionsOptions VALUES (2,1);
INSERT INTO SubscriptionsOptions VALUES (2,2);
INSERT INTO SubscriptionsOptions VALUES (2,3);
INSERT INTO SubscriptionsOptions VALUES (3,1);
INSERT INTO SubscriptionsOptions VALUES (3,2);
INSERT INTO SubscriptionsOptions VALUES (3,3);
INSERT INTO UserSubscriptions VALUES (1,1);
INSERT INTO UserSubscriptions VALUES (1,2);
INSERT INTO UserSubscriptions VALUES (1,3);
INSERT INTO UserSubscriptions VALUES (1,4);
INSERT INTO UserSubscriptions VALUES (1,5);
INSERT INTO UserSubscriptions VALUES (1,6);
INSERT INTO UserSubscriptions VALUES (1,7);
INSERT INTO UserSubscriptions VALUES (1,8);
INSERT INTO UserSubscriptions VALUES (1,9);
INSERT INTO UserSubscriptions VALUES (2,1);
INSERT INTO UserSubscriptions VALUES (2,2);
INSERT INTO UserSubscriptions VALUES (2,3);
INSERT INTO UserSubscriptions VALUES (2,4);
INSERT INTO UserSubscriptions VALUES (2,5);
INSERT INTO UserSubscriptions VALUES (3,1);
INSERT INTO UserSubscriptions VALUES (3,3);
Options
1 E-mail
2 Call
3 Fax
Subscriptions
1 Promo1
2 Promo2
3 Promo3
SubscriptionsOptions
id SubscriptionId OptionId
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
UserSubscriptions
userid SubscriptionOptionId
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 1
2 2
2 3
2 4
2 5
3 1
3 3
The final output of UserSubscriptions has to be like below, am having hard time for the deletion script for the above mentioned scenarios, I really appreciate if someone can help me with the deletion script
userid SubscriptionOptionId
1 1
1 2
1 3
2 1
2 2
2 3
2 4
2 5
3 1

Your requirement does not seem to match your expected output.
Your requirement says "delete UserSubscriptions records with redundant options, meaning if a UserId is associated with two or more subscriptions with same options I need to keep the first subscription and then delete the rest of the subscriptions". But in your expected output that is not satisfied.
For example, in your expected output you have deleted subscriptionOptionId 4 from userId 1, but you have not deleted it from userId 2. Why not?
Conversely, for userId 3 you have deleted subscriptionOptionId 3 from userSubscrptions, but subscriptionOptionId 1 has optionId 1, whereas subscriptionOptionId 3 has optionId 3. These are different options, so according to your requirement neither should have been deleted.
A query which actually deletes rows according to your stated requirement (but which will not produce your expected result) would be:
delete sub
from userSubscriptions sub
join SubscriptionsOptions opt on opt.id = sub.SubscriptionsOptionsId
where exists
(
select *
from userSubscriptions prevSub
join SubscriptionsOptions prevOpt on prevOpt.Id = prevSub.SubscriptionsOptionsId
where prevSub.userId = sub.userId
and prevSub.id < sub.id
)
Having said that, if I think about what this is modelling a little bit, I think the semantics of your requirements as described aren't really what you want. If I, as a user, am subscribed to two different promotions, it seems sensible that I could have both of those promotions use the same delivery option.
However, what would be redundant would be a situation where I am subscribed to a promotion, and the database says that I want that promo sent to me in a particular way, and it says it twice. For example, "allmhuran is subscribed to promo1 and wants it delivered by email", and "allmhuran is subscribed to promo1 and wants it delivered by email".
But there are no such redundancies in your sample data, because there are no redundancies in your subscriptionsOptions table. Every row is a unique combination of {subscriptionId, optionId}.
On the off chance that this is what you actually wanted to do, the delete statement only changes slightly:
delete sub
from userSubscriptions sub
join SubscriptionsOptions opt on opt.id = sub.SubscriptionsOptionsId
where exists
(
select *
from userSubscriptions prevSub
join SubscriptionsOptions prevOpt on prevOpt.Id = prevSub.SubscriptionsOptionsId
where prevSub.userId = sub.userId
and prevOpt.subscriptionId = opt.subscriptionId -- this condition was not in the previous statement
and prevSub.id < sub.id
)
If that is the intention, then I would also say that the existing schema is "overnormalized". All you really need is a UserSubscriptionOptions table (userId int, subscriptionId int, optionId int, primary key (userId, subscriptionId, optionId)

This is simpler than it seems:
Create a view over UserSubscriptions joined to SubscriptionOptions
Order by user, than as desired e.g. from "earliest" to "latest"
Add a hash across user id and option id
For each duplicate hash delete the corresponding SubscriptionOptions row
Once done, delete any Subscriptions rows with no corresponding SubscriptionOptions rows
This is doable in SQL or the imperative language of your choice.

Related

Create combination sql table

I'm trying to create a sql table in data base in VS that has room and userid column, but the sql will only accept your input if the userid exists in users table and room exists in rooms tables
Allows:
Users table:
Userid
1
2
3
RoomUsers table:
Room ----- User
1 1
2. 1
1. 2
1. 3
2. 3
Won't allow:
Users table:
Userid
1
2
RoomUsers table:
Room ----- User
1 4
Normal foreign key wont work because it only allows one of each index and not multiple, how can I allow what I need to occur,to happen?
(This would be a mess in comments)
Probably we are having an XY problem here. The thing you describe is simply solved with a foreign key. ie:
CREATE TABLE users (id INT IDENTITY NOT NULL PRIMARY KEY, ad VARCHAR(100));
CREATE TABLE rooms (id INT IDENTITY NOT NULL PRIMARY KEY, ad VARCHAR(100));
CREATE TABLE room_user
(
RoomId INT NOT NULL
, UserId INT NOT NULL
, CONSTRAINT PK_roomuser
PRIMARY KEY(RoomId, UserId)
, CONSTRAINT fk_room
FOREIGN KEY(RoomId)
REFERENCES dbo.rooms(id)
, CONSTRAINT fk_user
FOREIGN KEY(UserId)
REFERENCES dbo.users(id)
);
INSERT INTO dbo.users(ad)
OUTPUT
Inserted.id, Inserted.ad
VALUES('RayBoy')
, ('John')
, ('Frank');
INSERT INTO dbo.rooms(ad)
OUTPUT
Inserted.id, Inserted.ad
VALUES('Room1')
, ('Room2')
, ('Room3');
INSERT INTO dbo.room_user(RoomId, UserId)VALUES(1, 1), (1, 2), (2, 3);
-- won't allow
INSERT INTO dbo.room_user(RoomId, UserId)VALUES(999, 888);

SQL Server 2008: many-to-many relationship: Concatenation in SELECT query [duplicate]

This question already has answers here:
How to make a query with group_concat in sql server [duplicate]
(4 answers)
Closed 4 years ago.
There are 3 tables:
Project
Tool
LinkProjectTool
I need a query that lists everything in the Project table plus an extra column called ProjectTools. This column should contain a comma delimited string with all the tool names belonging to each project.
The data is:
Table Project:
ID Name Client
------------------------
0 table Anna
1 chair Bobby
2 workbench James
3 window Jenny
4 shelves Matthew
Table Tool:
ID Name
------------------------
0 hammer
1 measuring tape
2 pliers
3 scissors
4 spanner
5 saw
6 screwdriver
Table LinkProjectTool:
IDProject IDTool
-------------------
0 0
0 3
2 1
2 4
2 5
The result should be:
ID Name Client ProjectTools
-------------------------------------------------------------
0 table Anna hammer, scissors
1 chair Bobby
2 workbench James measuring tape, spanner, saw
3 window Jenny
4 shelves Matthew
Here are the queries I used to create these tables:
CREATE TABLE [dbo].[Project]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](15) NOT NULL,
[Client] [nvarchar](15) NULL
)
INSERT INTO [dbo].[Project]
(ID, Name, Client)
VALUES
(0, 'table', 'Anna'),
(1, 'chair', 'Bobby'),
(2, 'workbench', 'James'),
(3, 'window', 'Jenny'),
(4, 'shelves', 'Matthew')
CREATE TABLE [dbo].[Tool](
[ID] [tinyint] IDENTITY(0,1) NOT NULL,
[Name] [nvarchar](30) NULL,
CONSTRAINT [PK_Tool] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].Tool
(Name)
VALUES
('hammer'),
('measuring tape'),
('pliers'),
('scissors'),
('spanner'),
('saw'),
('screwdriver')
CREATE TABLE [dbo].LinkProjectTool
(
[IDProject] [int] NOT NULL,
[IDTool] [tinyint] NULL
)
INSERT INTO [dbo].LinkProjectTool
(IDProject, IDTool)
VALUES
(0, 0),
(0, 3),
(2, 1),
(2, 4),
(2, 5)
Could you, please, help?
Thank you.
You can use STUFF function alongside with FOR XML (see this answer for a more detailed explanation on how they work).
Assuming you want the project tools to be separated by a comma and a blank space, you can use the following query:
SELECT DISTINCT p.ID, p.Name, p.Client,
ProjectTools = STUFF((
SELECT ', ' + t.Name
FROM Tool t
WHERE t.ID IN (SELECT IDTool FROM LinkProjectTool WHERE IdProject = p.ID)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
FROM Project p LEFT OUTER JOIN LinkProjectTool lpt ON p.Id = lpt.IDProject
ORDER BY p.ID

SQL Server 2005 Update Triggers not working for dependent columns

I have encountered a strange situation where the update trigger on a table is not updating columns that are dependent on other columns which are also getting updated during the update. Here is the background and the code for replicating this problem.
I have a commodities management application that keeps track of fruit prices everyday. I have a need to calculate the Price and Volume trend for fruits on a daily purpose. The daily Fruit prices and price volume calculations are stored in the FruitTrades table. I have defined an Update trigger on this table which will calculate the Price and Volume trend whenever a row is inserted or updated in this table.
The daily fruit prices and volume come to me in a flat file which I import into a simple Table called PriceData. Then I move the Price and Volume information from this table to the FruitTrades table using a simple INSERT statement. This fires the update triggers in the FruitTrades, but two of the columns do not get updated by the trigger. Any idea why?
Steps for replicating this problem are as follows:
-- STEP 1 (create the FruitTrades table)
CREATE TABLE [dbo].[FruitTrades](
[FID] [nchar](3) NOT NULL,
[TradeDate] [smalldatetime] NOT NULL,
[TAID] [tinyint] NULL,
[Price] [real] NOT NULL,
[Vol] [int] NULL,
[3DAvgPrice] [real] NULL,
[5DAvgPrice] [real] NULL,
[VolTrend] [real] NULL,
[VolTrendPrevD] [real] NULL,
CONSTRAINT [PK_FruitTrades] PRIMARY KEY CLUSTERED
(
[FID] ASC,
[TradeDate] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
-- STEP 2 (Create the Update trigger)
CREATE TRIGGER [dbo].[TRG_FruitTrades_Analysis]
ON [dbo].[FruitTrades]
AFTER INSERT, UPDATE
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
UPDATE FruitTrades SET
-- Calculate the 3 day average price
[FruitTrades].[3DAvgPrice] =
(
SELECT AVG(Price) FROM
(
SELECT TOP 3 Price FROM FruitTrades
WHERE FID = [Inserted].[FID] AND TradeDate <= [Inserted].[TradeDate]
) AS Last3Trades
),
-- Calculate the 5 day average price
[FruitTrades].[5DAvgPrice] =
(
SELECT AVG(Price) FROM
(
SELECT TOP 5 Price FROM FruitTrades
WHERE FID = [Inserted].[FID] AND TradeDate <= [Inserted].[TradeDate]
) AS Last5Trades
),
-- Fetch the previous days VolTrend and update VolTrendPrev column
[FruitTrades].[VolTrendPrevD] =
(
SELECT TOP 1 VolTrend FROM FruitTrades
WHERE FID = [Inserted].[FID] AND TradeDate < [Inserted].[TradeDate]
),
-- Calculate Volume Trend and update VolTrend column
[FruitTrades].[VolTrend] =
(
ISNULL([FruitTrades].[VolTrendPrevD], 0) +
([Inserted].[Vol] * (([Inserted].[Price] /
(SELECT TOP 1 Price FROM FruitTrades WHERE FID = [Inserted].[FID] AND TradeDate < [Inserted].[TradeDate])) - 1.0 ))
),
-- Now Update the Action ID column
[FruitTrades].[TAID] =
(
CASE
WHEN [FruitTrades].[3DAvgPrice] >= [FruitTrades].[5DAvgPrice] AND [FruitTrades].[VolTrend] >= [FruitTrades].[VolTrendPrevD] THEN 1
WHEN [FruitTrades].[3DAvgPrice] >= [FruitTrades].[5DAvgPrice] AND [FruitTrades].[VolTrend] <= [FruitTrades].[VolTrendPrevD] THEN 2
WHEN [FruitTrades].[3DAvgPrice] <= [FruitTrades].[5DAvgPrice] AND [FruitTrades].[VolTrend] >= [FruitTrades].[VolTrendPrevD] THEN 3
WHEN [FruitTrades].[3DAvgPrice] <= [FruitTrades].[5DAvgPrice] AND [FruitTrades].[VolTrend] <= [FruitTrades].[VolTrendPrevD] THEN 4
ELSE NULL
END
)
FROM FruitTrades
INNER JOIN Inserted ON Inserted.FID = FruitTrades.FID AND Inserted.TradeDate = FruitTrades.TradeDate
END
-- STEP 3 (Create the PriceData table)
CREATE TABLE [dbo].[PriceData](
[FID] [nchar](3) NOT NULL,
[TradeDate] [smalldatetime] NOT NULL,
[Price] [real] NULL,
[Vol] [real] NULL,
CONSTRAINT [PK_PriceData] PRIMARY KEY CLUSTERED
(
[FID] ASC,
[TradeDate] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
-- STEP 4 (simulate data import into PriceData table)
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/30/2012', 200, 1000);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/29/2012', 190, 1200);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/28/2012', 195, 1250);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/27/2012', 205, 1950);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/26/2012', 200, 2000);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/25/2012', 180, 1300);
INSERT INTO PriceData (FID, TradeDate, Price, Vol) VALUES ('APL', '4/24/2012', 185, 1250);
-- STEP 5 (move price vol date from PriceDate table to Fruit Tables)
INSERT INTO FruitTrades (FID, TradeDate, Price, Vol) SELECT FID, TradeDate, Price, Vol FROM PriceData;
-- STEP 6 (check the FruitTrades table for correctness)
SELECT * FROM FruitTrades ORDER BY TradeDate
--- Results
After Step 6 you will find that the TAID and VolTrendPrevD in the FruitTrades table columns remain NULL.
Any help on how to resolve this problem is highly appreciated.
After laboring on this problem for 5 days, and some Googling, I finally found the solution myself.
The first problem was on account of my lack of understanding of triggers and how they fire. As per SQL documentation,
SQL Server triggers fire only once per statement, not once per
affected row
Due to this design principle, when in Step 5, I do a bulk Insert from the PriceData table into the FruitTrades table, the trigger is fired only once, instead of once for each row. Hence the Updated values are incorrect.
The VolTrendPrevD remains null because the Select statement for it in the Update trigger always matches the first row in the FruitTrades table (since the Inserted table has multiple rows) and for this row, the VolTrend is null.
The TAID remains null because VolTrendPrevD is null.
Now the fix:
Import the text file containing the price data into a MSaccess table. From there do a bulk insert in to the SQL Server table (using Linked tables). This approach uses ODBC to convert the bulk insert into multiple single inserts, thus bypassing the first problem.
Convert the VolTrend into a computed column. There is no need to update it therefore in the trigger.
Introduce an additional column PricePrevD in the FruitTrades table and update its value in the trigger, in the same manner as the VolTrendPrevD column.
Most importantly, ensure that the inserted rows from Access are inserted in ascending order by date (by creating an appropriate date index in Access). Else the desired results will be missing.
Hope this helps... :-)

How to avoid bad data entries in table through proper Key relationship?

I have 3 tables. widgets, widget types, widget type ID. I have created direct relationship between them. How can I avoid bad data going into Widget tables based on Widget_type and Widget_sub_Type. Please see my code. There is just one small thing missing.
Create table widgets (
Widget_ID int not null primary key,
Widget_type_ID int not null,
Widget_Sub_type_ID int,
Widget_Name varchar (50)
)
Create table Widget_Type (
Widget_Type_ID int not null primary key,
)
Create table Widget_Sub_type (
Widget_Sub_Type_ID int not null primary key,
Widget_Type_ID int not null
)
---adding foregin key constraints
Alter table widgets
ADD constraint FK_Widget_Type
FOREIGN KEY (Widget_type_ID)
References Widget_Type (Widget_type_ID)
Alter table widgets
ADD constraint FK_Widget_Sub_Type
FOREIGN KEY (Widget_Sub_type_ID)
References Widget_SUB_Type (Widget_SUB_type_ID)
Alter table widget_Sub_Type
ADD Constraint FK_Widget_Type_Alter
Foreign key (widget_Type_ID)
References Widget_Type (Widget_Type_ID)
---- insert values
insert Widget_Type values (1)
insert Widget_Type values (5)
insert Widget_Sub_type values (3,1)
insert Widget_Sub_type values (4,1)
insert Widget_Sub_type values (7,5)
insert Widget_Sub_type values (9,5)
-- This will error out which is correct
insert Widget_Sub_type values (5,6)
select * from Widget_Sub_type
select * from Widget_type
--Good
insert widgets (Widget_ID,Widget_Name, Widget_type_ID, Widget_Sub_type_ID)
values (1, 'TOY', 1, 3)
select * from widgets
--Good
insert widgets (Widget_ID,Widget_Name, Widget_type_ID, Widget_Sub_type_ID)
values (2, 'BatMan', 5, 7)
-- How to prevenet this, 3 is not sub_type_id of type_ID 5. This is bad data, It should not be inserted.
insert widgets (Widget_ID,Widget_Name, Widget_type_ID, Widget_Sub_type_ID)
values (3, 'Should Not', 5, 3)

How To Get A Hierarchical CTE In SQL Server To Filter With Parent and Child Logic

I'm having a vexing problem with a hierarchical CTE and some strange logic that we need to address that I really hope someone could assist with pointing out what I'm doing wrong to address this scenario with a CTE.
Here is the hierarchical data we're dealing with in this example:
This is the problematic SQL followed by the description of the problem and SQL statements to create a test table with data:
DECLARE #UserId nvarchar(50);
SET #UserId = 'A';
DECLARE #StatusType int;
SET #StatusType = '2';
;WITH recursiveItems (Id, Depth)
AS
(
SELECT Id, 0 AS Depth
FROM dbo.CteTest
WHERE UserId = #UserId
--AND StatusType = #StatusType
-- This would also be incorrect for the issue
AND ParentId IS NULL
UNION ALL
SELECT dbo.CteTest.Id, Depth + 1
FROM dbo.CteTest
INNER JOIN recursiveItems
ON dbo.CteTest.ParentId = recursiveItems.Id
WHERE UserId = #UserId
AND StatusType = #StatusType
)
SELECT A.*, recursiveItems.Depth
FROM recursiveItems
INNER JOIN dbo.CteTest A WITH(NOLOCK) ON
recursiveItems.Id = A.Id
ORDER BY A.Id
This is not returning the desired data. The data that is currently returned is in the NOT CORRECT section of the image below. The row with the Id of 10 is the row that we want to omit.
Essentially the logic should be that any parent record (record with children) where the status type of any of its children is equal to 2 should be returned along with its children. In the example this is the rows with Ids: 1, 5, 6, 7, 9.
Currently the CTE/SQL/Code is returning ALL parent records no matter what,
The record with the Id 1 should be returned, even though it's status type is 1 because at least one of its children, their children, grandchildren, etc. have a status type that is equal to 2.
The record with the Id of 10 should not be returned because it does not have a status that is equal to 2 or any children. If the record had a status type of 2 when it has no child records it should also be returned.
This is the DDL to create a test table that helps to show the problem:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CteTest](
[Id] [int] IDENTITY(1,1) NOT NULL,
[StatusType] [int] NOT NULL,
[UserId] [nvarchar](50) NOT NULL,
[ParentId] [int] NULL,
CONSTRAINT [PK_CteTest] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
This is the seed data for the table, that can demonstrate the issue:
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (1,'A',NULL)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (1,'B',NULL)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (2,'B',NULL)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (1,'A',1)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (2,'A',1)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (2,'A',5)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (2,'A',6)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (3,'A',6)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (2,'A',NULL)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (4,'A',NULL)
INSERT INTO [dbo].[CteTest]([StatusType],[UserId],[ParentId]) VALUES (3,'A',10)
The issue is that your base case includes all null (parentless) items, and there is no way to filter them out later.
Because you are looking for only items with a particular statustype, you may want to refactor the CTE; Instead of having a base case be the root values, you can have it be all items with the given statustype, and then recursively find the parents. In the solution below, I have depth be a negative number, for distance from the item with a value of 2 in the given tree (so negative height, instead of depth.).
DECLARE #UserId nvarchar(50);
SET #UserId = 'A';
DECLARE #StatusType int;
SET #StatusType = '2';
WITH recursiveItems (Id, ParentID, Depth)
AS
(
SELECT Id, ParentID, 0 AS Depth
FROM dbo.CteTest
WHERE UserId = #UserId AND StatusType = #StatusType
UNION ALL
SELECT dbo.CteTest.Id, CteTest.ParentID, Depth - 1
FROM dbo.CteTest
INNER JOIN recursiveItems
ON dbo.CteTest.Id = recursiveItems.ParentId
WHERE UserId = #UserId
)
SELECT A.Id, A.StatusType, A.UserId, A.ParentId, min(recursiveItems.Depth)
FROM recursiveItems
INNER JOIN dbo.CteTest A WITH(NOLOCK) ON
recursiveItems.Id = A.Id
group by A.Id, A.StatusType, A.UserId, A.ParentId
ORDER BY A.Id