SQL Server, self referencing foreign compound key - sql

I have a table with columns task_id (pk), client_id, parent_task_id, title. In other words, tasks are owned by clients, and some tasks have child tasks.
For example, client 7 may have a task "wash the car," with child tasks "vacuum carpet" and "wipe dashboard."
I want a constraint so that a task and its children are always owned by the same client.
Through a bit of experimentation, to do this, I created a self-referencing foreign key (client_id, parent_task_id) referencing (client_id, task_id). At first I received an error (There are no primary or candidate keys in the referenced table that match the referencing column list in the foreign key.) So I added a unique key for columns task_id, client_id. Now it seems to work.
I am wondering if this is the best solution (or at least reasonable one) to enforce this constraint. Any thoughts would be appreciated. Thanks much!

A 'parent' record would not need a [parent_task_id]
TASK ID | CLIENT ID | PARENT TASK ID | TITLE
1 | 7 | NULL | wash the car
(To find all of your parent records, SELECT * FROM TABLE WHERE [parent_task_id] is null)
A 'child' record would need a [parent_task_id], but not a [client_id] (because, as you stipulate, a child has the same client as it's parent).
TASK ID | CLIENT ID | PARENT TASK ID | TITLE
2 | NULL | 1 | vacuum carpent
3 | NULL | 1 | wipe dashboard
In this way, your self-referencing foreign key is all the constraint you need. No constraint / rule concerning [client_id] on child records is necessary, because all [client_id] values on child records will be ignored, in favor of the [client_id] on the parent record.
For example, if you want to know what the [client_id] is for a child record:
SELECT
c.task_id,
p.client_id,
c.title
FROM
table p --parent
INNER JOIN table c --child
ON p.task_id = c.parent_task_id
UPDATE
(How to query for the client ID of a grand-child)
--Create and populate your table (using a table var in this sample)
DECLARE #table table (task_id int, client_id int, parent_task_id int, title varchar(50))
INSERT INTO #table VALUES (1,7,NULL,'wash the car')
INSERT INTO #table VALUES (2,NULL,1,'vacuum carpet')
INSERT INTO #table VALUES (3,NULL,1,'wipe dashboard')
INSERT INTO #table VALUES (4,NULL,2,'Step 1: plug-in the vacuum')
INSERT INTO #table VALUES (5,NULL,2,'Step 2: turn-on the vacuum')
INSERT INTO #table VALUES (6,NULL,2,'Step 3: use the vacuum')
INSERT INTO #table VALUES (7,NULL,2,'Step 4: turn-off the vacuum')
INSERT INTO #table VALUES (8,NULL,2,'Step 5: empty the vacuum')
INSERT INTO #table VALUES (9,NULL,2,'Step 6: put-away the vacuum')
INSERT INTO #table VALUES (10,NULL,3,'Step 1: spray cleaner on the rag')
INSERT INTO #table VALUES (11,NULL,3,'Step 2: use the rag')
INSERT INTO #table VALUES (12,NULL,3,'Step 3: put-away the cleaner')
INSERT INTO #table VALUES (13,NULL,3,'Step 4: toss the rag in the laundry bin')
--Determine which grandchild you want the client_id for
DECLARE #task_id int
SET #task_id = 8 -- grandchild's ID to use to find client_id
--Create your CTE (this is the recursive part)
;WITH myList (task_id, client_id, parent_task_id, title)
AS
(
SELECT a.task_id, a.client_id, a.parent_task_id, a.title
FROM #table a
WHERE a.task_id = #task_id
UNION ALL
SELECT a.task_id, a.client_id, a.parent_task_id, a.title
FROM #table a
INNER JOIN myList m
ON a.task_id = m.parent_task_id
)
--Query your CTE
SELECT task_id, client_id, title FROM myList WHERE client_id is not null
In this example, I used a granchild's task_id (8 -- 'empty the vacuum') to find it's highest-level parent, which holds the client_id.
You can remove the WHERE clause from the last step if you want to see each parent, parent's parent, and so on up to the first-parent's record.

Related

How to I get distinct combinations of one XRef column related to any value in the other XRef column

I need to select the count of unique value combinations of column B in an XRef table which is grouped by column A.
Consider the following schema and data, which represents a simple family structure. Each child has a father and mother:
TABLE Father
FatherID
Name
1
Alex
2
Bob
TABLE Mother
MotherID
Name
1
Alice
2
Barbara
TABLE Child
ChildID
FatherID
MotherID
Name
1
1 (Alex)
1 (Alice)
Adam
2
1 (Alex)
1 (Alice)
Billy
3
1 (Alex)
2 (Barbara)
Celine
4
2 (Bob)
2 (Barbara)
Derek
The distinct combinations of mothers for each father are:
Alex (Alice, Barbara)
Bob (Barbara)
In all there are two distinct combinations of mothers:
Alice, Barbara
Barbara
The query I want to write would return the count of those distinct combinations of mother, regardless of which father they are associated with:
UniqueMotherGroups
2
I was able to do this successfully using the STRING_AGG function, but it feels clunky. It also needs to operate over millions of rows and is quite slow at the moment. Is there a more idiomatic way to do this with set operations instead?
Here is my working example:
-- Drop pre-existing tables
DROP TABLE IF EXISTS dbo.Child;
DROP TABLE IF EXISTS dbo.Father;
DROP TABLE IF EXISTS dbo.Mother;
-- Create family tables.
CREATE TABLE dbo.Father
(
FatherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Father
ADD CONSTRAINT PK_Father
PRIMARY KEY CLUSTERED (FatherID);
ALTER TABLE dbo.Father SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Mother
(
MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Mother
ADD CONSTRAINT PK_Mother
PRIMARY KEY CLUSTERED (MotherID);
ALTER TABLE dbo.Mother SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Child
(
ChildID INT NOT NULL
, FatherID INT NOT NULL
, MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Child
ADD CONSTRAINT PK_Child
PRIMARY KEY CLUSTERED (ChildID);
CREATE NONCLUSTERED INDEX IX_Parents ON dbo.Child (FatherID, MotherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Father
FOREIGN KEY (FatherID)
REFERENCES dbo.Father (FatherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Mother
FOREIGN KEY (MotherID)
REFERENCES dbo.Mother (MotherID);
-- Insert two children with the same parents
INSERT INTO dbo.Father
(
FatherID
, Name
)
VALUES
(1, 'Alex')
, (2, 'Bob')
, (3, 'Charlie')
INSERT INTO dbo.Mother
(
MotherID
, Name
)
VALUES
(1, 'Alice')
, (2, 'Barbara');
INSERT INTO dbo.Child
(
ChildID
, FatherID
, MotherID
, Name
)
VALUES
(1, 1, 1, 'Adam')
, (2, 1, 1, 'Billy')
, (3, 1, 2, 'Celine')
, (4, 2, 2, 'Derek')
, (5, 3, 1, 'Eric');
-- CTE Gets distinct combinations of parents
WITH distinctParentCombinations (FatherID, MotherID)
AS (SELECT children.FatherID
, children.MotherID
FROM dbo.Child as children
GROUP BY children.FatherID
, children.MotherID
)
-- CTE Gets uses STRING_AGG to get unique combinations of mothers.
, motherGroups (Mothers)
AS (SELECT STRING_AGG(CONVERT(VARCHAR(MAX), distinctParentCombinations.MotherID), '-') WITHIN GROUP (ORDER BY distinctParentCombinations.MotherID) AS Mothers
FROM distinctParentCombinations
GROUP BY distinctParentCombinations.FatherID
)
-- Remove the COUNT function to see the actual combinations
SELECT COUNT(motherGroups.Mothers) AS UniqueMotherGroups
FROM motherGroups
-- Clean up the example
DROP TABLE IF EXISTS dbo.Child;
DROP TABLE IF EXISTS dbo.Father;
DROP TABLE IF EXISTS dbo.Mother;
You have a great explanation and setup of your "problem case".
Your setup runs great in (for example) tempdb.
You have solved the problem in a nice way, and I don't think you can optimize it much further if you are going to calculate the mother groups every time you run the query.
There is one small mistake though; You must do a COUNT(DISTINCT motherGroups.Mothers) in your final count.
Since you mention milions of rows, I would suggest a slightly different approach.
If you aggregate the mother groups as soon as there is a change in the Child table, your query can run fast every time - even with millions of rows.
The kind of queries you want to run is seldom run only once, so it would be nice if the heavy work is already done.
Usually I prefer not to use triggers, because you get extra logic in a place where it could be hard to find and debug.
But sometimes triggers are nice to have, especially when you are not able to change the source code running on the clients.
So, my solution is to add a new column to the Father table and to create a trigger which (re)generates the mother group each time there is a change in the Child table.
This way, the hard aggregation work for each father is done as soon there is a change, and you don't have to aggregate when you run your query.
Since you already have millions of rows, we also have to update these existing rows.
I have used SQL Server 2019 for this solution.
*** The solution ***
Add 1 or 2 new columns to the Father table.
If you should add 1 or 2, it depends on what your preferences are:
"Do I want to see the aggregated mother groups for debugging purpose, or do I just trust the hashed values?"
Column 1: Hashed value of the aggregated mother group for each Father row.
The hashed value is VARBINARY and is at least 32 bytes, but we will use VARBINARY(1600):
1600 is less than 1700 which is the max nonclustered index size, so we will not have any problems indexing the column.
Since the hash value is in blocks of 32 bytes, a value of 1600 will cover a really, really, really long aggreated mother group.
-- Column 1: Hashed value of the aggregated mother group for each Father row.
alter table Father add MotherHash varbinary(1600)
create index IX_MotherHash on Father(MotherHash)
Column 2: This column is more optional, and depends on your preferences.
The column could be nice to have for debugging purpose if any questions are made about the result.
Which VARCHAR-length you should use depends on your real data.
MAX? Then you have no problems storing the mother groups, but you might have problems indexing it, since 1700 is the max for an unclustered index. But maybe you don't need to index it?
1700? Then you are able to index the column, but depending on your real data, will this cover the biggest mother group?
Why indexing? If you want to list the aggregated mother groups, it could be faster to read the index than the whole table.
As said; this depends on you (and your data). If we have no need to see the aggregated mother groups, then we don't need this column at all.
For this demo/solution we will add the column for debugging purpose, without any indexing.
-- Column 2: This column is more optional, and depends on your preferences.
alter table Father add MotherGroup varchar(MAX)
go
Create a trigger on the Child table.
It will handle all inserts, updates and deletes in the Child table.
create or alter trigger trIUD_Child on Child
after insert, update, delete
as
begin
set nocount on
-- Get all FatherIDs from the Inserted and Deleted table.
-- An ordinary Temp table is created with a clustered index to get SEEK performance later.
-- The table might also have more than 100 rows, where table variables are not recommended.
declare #numRowsInInsertedDeleted int
create table #rowsInInsertedDeleted(rowId int identity(1, 1), FatherID int)
create unique clustered index ix on #rowsInInsertedDeleted(rowId)
insert #rowsInInsertedDeleted(FatherID)
select distinct f.FatherID
from
(
select i.FatherID from inserted i
union all
select i.FatherID from deleted i
) f
select #numRowsInInsertedDeleted = max(rowId) from #rowsInInsertedDeleted
-- We have to loop each of the FatherIDs, since we might have several rows in the Inserted and Deleted tables.
declare #rowId int = 0
while (#rowId < #numRowsInInsertedDeleted)
begin
-- Get the father for the next row.
select #rowId += 1
declare #fatherId int
select #fatherId = r.FatherID
from #rowsInInsertedDeleted r
where r.rowId = #rowId
-- Aggregate the mothers for this father.
declare #motherGroup varchar(max) = ''
select #motherGroup += ',' + cast(c.MotherID as varchar)
from Child c
where c.FatherID = #fatherId
group by c.MotherID
order by c.MotherID
-- Update the father record.
-- Any empty strings are handled automatically, skip the leading ','.
update Father
set MotherGroup = substring(#motherGroup, 2, 2147483647),
MotherHash = HASHBYTES('SHA2_256', #motherGroup)
where FatherID = #fatherId
end
end
go
Updating existing rows
Since you already have millions of rows, we must aggregate the mother groups for these existing rows.
If you don't have the disk space for logging the update of the whole table, maybe you should take your database out of AG and switch to Simple recovery model for this task?
In that case you should also modify the update with a WHERE clause to update only parts of the table, and run the update for each part until the whole table is updated.
Example: update Child set FatherID = FatherID where FatherID between 1 and 1000000
Note: This update statement could block access to the Child table for other users.
-- Aggregate the mother groups for the existing rows.
-- This could takes minutes to complete, depending on the number of rows.
-- NOTE: This update statement could block access to the Child table for other users.
update Child set FatherID = FatherID
That's it!
You should now be able to quickly get the mother groups on existing rows, and also after future changes in the Child table.
-- Voila - now you can get the unique mother groups any time at a fast speed.
select count(distinct MotherHash) from Father
Thank you for posting such a comprehensive setup for the test data. However, I'm not running any CREATE/DROP statements against my DB so I converted those tables into table variables. Using your data, I came up with the following query. Just change the table names back to your dbo. names and you should be able to test in your environment. I basically concatenate every father/mother combo into a text string using FOR XML PATH. Then I count up all the distinct combos. If you find error in my logic, let me know. I'm just tossing this in the ring of possible solutions.
WITH distinctCombos AS (
SELECT DISTINCT
c.FatherID, c.MotherID
FROM #Child as c
) , motherComboCount AS (
SELECT
f.FatherID
, f.[Name]
, STUFF((
SELECT
',' + CAST(dc.MotherID as nvarchar)
FROM distinctCombos as dc
WHERE dc.FatherID = f.FatherID
ORDER BY dc.MotherID ASC
FOR XML PATH('')
),1,1,'') as motherList
FROM #Father as f
)
SELECT
COUNT(DISTINCT motherList) as UniqueMotherGroups
FROM motherComboCount as mcc
To save a bit of compute power, remove the STUFF function as it's not necessary for the comparison... it just makes the list nicer to look at if displaying... and I'm in the habit of using it.
It looks like the main differences between our methods is the use of FOR XML PATH vs STRING_AGG (I'm still on older SQL.) And I use DISTINCT twice instead of GROUP BY. If you have a larger dataset to test against, let me know how the 2 methods compare. I'm trying to think of a completely set-based method but I can't see it at the moment.
Update: Method 2.
Here's an idea I had using recursive CTEs to build the distinct mother combinations. In your example data, there are only 2 mothers per father. So there would be a total of 4 set-based queries performed (first CTE, 2 queries in the recursive CTE and the final SELECT).
WITH uniqueCombo as (
SELECT DISTINCT
c.FatherID
, c.MotherID
, ROW_NUMBER() OVER(PARTITION BY c.FatherID ORDER BY c.MotherID) as row_num
FROM #Child as c
), combos as (
SELECT
uc.FatherID
, uc.MotherID
, CAST(uc.MotherID as nvarchar(max)) as [path]
, row_num
, 0 as hierarchy_num
FROM uniqueCombo as uc
WHERE uc.row_num = 1
UNION ALL
SELECT
uc.FatherID
, uc.MotherID
, co.[path] + ',' + CAST(uc.MotherID as nvarchar(max))
, uc.row_num
, co.hierarchy_num + 1 as heirarchy_num
FROM uniqueCombo as uc
INNER JOIN combos as co
ON co.FatherID = uc.FatherID
--AND co.MotherID <> uc.MotherID
AND co.row_num + 1 = uc.row_num
), rankedCombos as (
SELECT
c.[path]
, ROW_NUMBER() OVER(PARTITION BY c.FatherID ORDER BY c.hierarchy_num DESC) as row_num
FROM combos as c
)
SELECT COUNT(DISTINCT rc.[path]) as UniqueMotherGroups
FROM rankedCombos as rc
WHERE rc.row_num = 1
Update 2:
I had another idea to use a PIVOT to transpose the records so that the FatherID would be in the left-most column with the MotherIDs as the column headers. To make that work with a dynamic list of MotherIDs, you have to use a dynamic PIVOT/dynamic SQL. (FatherID isn't really needed in the PIVOT so it's not included in the PIVOT query. I just had to describe what the goal is.) After the pivot, you can SELECT DISTINCT to get the unique mother combinations. Then the last SELECT is to get the COUNT. This one I ran in SQL Fiddle:
SQL Fiddle
MS SQL Server 2017 Schema Setup:
-- Create family tables.
CREATE TABLE dbo.Father
(
FatherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Father
ADD CONSTRAINT PK_Father
PRIMARY KEY CLUSTERED (FatherID);
ALTER TABLE dbo.Father SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Mother
(
MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Mother
ADD CONSTRAINT PK_Mother
PRIMARY KEY CLUSTERED (MotherID);
ALTER TABLE dbo.Mother SET (LOCK_ESCALATION = TABLE);
CREATE TABLE dbo.Child
(
ChildID INT NOT NULL
, FatherID INT NOT NULL
, MotherID INT NOT NULL
, Name VARCHAR(50) NOT NULL
);
ALTER TABLE dbo.Child
ADD CONSTRAINT PK_Child
PRIMARY KEY CLUSTERED (ChildID);
CREATE NONCLUSTERED INDEX IX_Parents ON dbo.Child (FatherID, MotherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Father
FOREIGN KEY (FatherID)
REFERENCES dbo.Father (FatherID);
ALTER TABLE dbo.Child
ADD CONSTRAINT FK_Child_Mother
FOREIGN KEY (MotherID)
REFERENCES dbo.Mother (MotherID);
-- Insert two children with the same parents
INSERT INTO dbo.Father
(
FatherID
, Name
)
VALUES
(1, 'Alex')
, (2, 'Bob')
, (3, 'Charlie')
INSERT INTO dbo.Mother
(
MotherID
, Name
)
VALUES
(1, 'Alice')
, (2, 'Barbara');
INSERT INTO dbo.Child
(
ChildID
, FatherID
, MotherID
, Name
)
VALUES
(1, 1, 1, 'Adam')
, (2, 1, 1, 'Billy')
, (3, 1, 2, 'Celine')
, (4, 2, 2, 'Derek')
, (5, 3, 1, 'Eric');
Query 1:
DECLARE #cols AS nvarchar(MAX)
DECLARE #query AS nvarchar(MAX)
SET #cols = STUFF((
SELECT DISTINCT ',' + QUOTENAME(m.MotherID)
FROM Mother as m
FOR XML PATH(''))
,1,1,'')
SET #query = 'SELECT COUNT(mCount) as UniqueMotherGroups FROM (
SELECT DISTINCT ' + #cols + ', 1 as mCount FROM (
SELECT ' + #cols + '
FROM (
SELECT
c.FatherID
, c.MotherID
, 1 as mID
FROM child as c
) x
PIVOT
(
MAX(mID)
FOR MotherID in (' + #cols + ')
) p
) as m
) as mg'
--SELECT #query
Exec(#query)
Results:
| UniqueMotherGroups |
|--------------------|
| 3 |
UPDATE 3: Here's one other idea... create a results table with a unique constraint and with IGNORE_DUP_KEY=ON. You could use this in a function or stored procedure, or, setup a trigger to put the mother combinations into a unique-combo-holding-table. With IGNORE_DUP_KEY=ON, you can insert every combo and only the unique combos will remain. Then just do a count of all the rows.
--Create a table to hold the results:
CREATE TABLE results (
ChildID int not null
, UniqueCombos nvarchar(50) not null
PRIMARY KEY WITH (IGNORE_DUP_KEY = ON)
);
--Insert all combos into the results table. The unique constraint will cause only unique entries to remain.
INSERT INTO results (ChildID, UniqueCombos)
SELECT DISTINCT
c.ChildID
, (
SELECT ',' + CAST(MotherID as nvarchar(500))
FROM Child as c2
WHERE c2.ChildID = c.ChildID
ORDER BY c2.MotherID
FOR XML PATH('')
) as mother_combos
FROM Child as c
;
--Count up all the rows in the results table. Since these are all unique combinations, it should be fast to sum.
SELECT COUNT(*)
FROM results;
If you accept to define a maximum number of mothers per father (here 7) you may try:
select count(*) as UniqueMotherGroups from (
select distinct m1, m2, m3, m4, m5, m6, m7 from (
select FatherID, row_number() over(partition by FatherID order by motherid) as rn, motherid
from (
select distinct FatherID, MotherID
from t_Child
)
)
pivot (
max(motherid) for rn in (1 as m1,2 as m2,3 as m3,4 as m4,5 as m5,6 as m6,7 as m7)
)
)
;
UNIQUEMOTHERGROUPS
------------------
3
Here is one idea. Instead of using precise STRING_AGG you can calculate a hash / checksum of the group. You don't need to know the exact composition of the group, you just need to distinguish between different groups. Calculating of the hash may be faster than concatenating strings.
SQL Server has a function CHECKSUM_AGG
You can write your own hashing function with CLR.
Sample data
CREATE TABLE #Child
(
ChildID INT NOT NULL IDENTITY PRIMARY KEY
,FatherID INT NOT NULL
,MotherID INT NOT NULL
,Name VARCHAR(50) NOT NULL
);
INSERT INTO #Child
(
FatherID
,MotherID
,Name
)
VALUES
(1, 1, 'Adam')
,(1, 1, 'Billy')
,(1, 2, 'Celine')
,(2, 2, 'Derek')
,(3, 1, 'Eric')
,(4, 1, 'A')
,(4, 1, 'B')
,(4, 2, 'C')
,(4, 2, 'D')
,(4, 2, 'E')
,(5, 2, 'F')
,(6, 2, 'G')
;
Query
WITH
distinctParentCombinations
AS
(
SELECT
FatherID
,MotherID
FROM #Child
GROUP BY
FatherID
,MotherID
)
,motherGroups
AS
(
SELECT
FatherID
,CHECKSUM_AGG(MotherID) AS MotherGroup
FROM distinctParentCombinations
GROUP BY
FatherID
)
SELECT COUNT(DISTINCT MotherGroup) AS UniqueMotherGroups
FROM motherGroups
;
Result
+--------------------+
| UniqueMotherGroups |
+--------------------+
| 3 |
+--------------------+
You need to compare performance of all methods on your actual data.
Obviously, with CHECKSUM_AGG it is possible that some of the groups will be missed. There is a chance that two different groups will generate the same checksum.
You know better if this is acceptable.
General way to speed up calculations is to have some of the results already pre-calculated. In your case, for the first part you can create indexed view as follows:
CREATE OR ALTER VIEW vw_distinctParentCombinations WITH SCHEMABINDING AS
SELECT children.FatherID
, children.MotherID
,COUNT_BIG(*) AS [wifes_count]
FROM dbo.Child as children
GROUP BY children.FatherID
, children.MotherID
GO
CREATE UNIQUE CLUSTERED INDEX IX_vw_distinctParentCombinations ON vw_distinctParentCombinations
(
FatherID,MotherID
);
Then in your initial query, you can avoid the first CTE:
-- CTE Gets distinct combinations of parents
WITH motherGroups (Mothers)
AS
(SELECT STRING_AGG(CONVERT(VARCHAR(MAX), distinctParentCombinations.MotherID), '-') WITHIN GROUP (ORDER BY distinctParentCombinations.MotherID) AS Mothers
FROM vw_distinctParentCombinations distinctParentCombinations WITH(NOEXPAND)
GROUP BY distinctParentCombinations.FatherID
)
-- Remove the COUNT function to see the actual combinations
SELECT COUNT(motherGroups.Mothers) AS UniqueMotherGroups
FROM motherGroups;
This will avoid the initial read of the large table and depending the distinct combinations of the pairs (father - mother) it can reduce the view size significantly.
Unfortunately, there are a lot of limitations in order to create an indexed view, and you are not able to create such for the second CTE.
If we change our mind and look this issue in different view, simply we can get the count of mothers with this query:
SELECT Count(distinct ConcatMothers) UniqueMothersCount from(
SELECT FatherID, concat(FatherID,'-',SUM(MotherID)) ConcatMothers
FROM dbo.Child
GROUP BY FatherID) t;
Or even you can use Dense_Rank() like this:
SELECT Max(RankMothers) UniqueMothersCount from(
SELECT FatherID, DENSE_RANK() over (order by concat(FatherID,'-',SUM(MotherID))) RankMothers
FROM dbo.Child
GROUP BY FatherID) t;
For the performance it is hard to measure because dataset is small but since we have one column in the group by and the motherId is in the select maybe we can change index as below:
CREATE NONCLUSTERED INDEX IX_Parents ON dbo.Child (FatherID) Include(MotherID);
but you need to check it on your dataset.

How to split data in SQL Server table row

I have table of transaction which contains a column transactionId that has values like |H000021|B1|.
I need to make a join with table Category which has a column CategoryID with values like H000021.
I cannot apply join unless data is same.
So I want to split or remove the unnecessary data contained in TransctionId so that I can join both tables.
Kindly help me with the solutions.
Create a computed column with the code only.
Initial scenario:
create table Transactions
(
transactionId varchar(12) primary key,
whatever varchar(100)
)
create table Category
(
transactionId varchar(7) primary key,
name varchar(100)
)
insert into Transactions
select'|H000021|B1|', 'Anything'
insert into Category
select 'H000021', 'A category'
Add computed column:
alter table Transactions add transactionId_code as substring(transactionid, 2, 7) persisted
Join using the new computed column:
select *
from Transactions t
inner join Category c on t.transactionId_code = c.transactionId
Get a straighforward query plan:
You should fix your data so the columns are the same. But sometimes we are stuck with other people's bad design decisions. In particular, the transaction data should contain a column for the category -- even if the category is part of the id.
In any case:
select . . .
from transaction t join
category c
on transactionid like '|' + categoryid + |%';
Or if the category id is always 7 characters:
select . . .
from transaction t join
category c
on categoryid = substring(transactionid, 2, 7)
You can do this using query :
CREATE TABLE #MyTable
(PrimaryKey int PRIMARY KEY,
KeyTransacFull varchar(50)
);
GO
CREATE TABLE #MyTransaction
(PrimaryKey int PRIMARY KEY,
KeyTransac varchar(50)
);
GO
INSERT INTO #MyTable
SELECT 1, '|H000021|B1|'
INSERT INTO #MyTable
SELECT 2, '|H000021|B1|'
INSERT INTO #MyTransaction
SELECT 1, 'H000021'
SELECT * FROM #MyTable
SELECT * FROM #MyTransaction
SELECT *
FROM #MyTable
JOIN #MyTransaction ON KeyTransacFull LIKE '|'+KeyTransac+'|%'
DROP TABLE #MyTable
DROP TABLE #MyTransaction

Make a copy of parent-child structure in SQL

I have a table MODELS to which several ITEMS can belong. The ITEMS table is a hierarchical table with a self join on the PARENT column. Root level items will have Null in PARENT. Items can go to any level deep.
create table MODELS (
MODELID int identity,
MODELNAME nvarchar(200) not null,
constraint PK_MODELS primary key (MODELID)
)
go
create table ITEMS (
ITEMID int identity,
MODELID int not null,
PARENT int null,
ITEMNUM nvarchar(20) not null,
constraint PK_ITEMS primary key (ITEMID)
)
go
alter table ITEMS
add constraint FK_ITEMS_MODEL foreign key (MODELID)
references MODELS (MODELID)
go
alter table ITEMS
add constraint FK_ITEMS_ITEMS foreign key (PARENT)
references ITEMS (ITEMID)
go
I wish to create stored procedure to copy a row in the MODELS table into a new row and also copy the entire structure in ITEMS as well.
For example, if I have the following in ITEMS:
ITEMID MODELID PARENT ITEMNUM
1 1 Null A
2 1 Null B
3 1 Null C
4 1 1 A.A
5 1 2 B.B
6 1 4 A.A.A
7 1 4 A.A.B
8 1 3 C.A
9 1 3 C.B
10 1 9 C.B.A
I'd like to create new Model row and copies of the 10 Items that should be as follows:
ITEMID MODELID PARENT ITEMNUM
11 2 Null A
12 2 Null B
13 2 Null C
14 2 11 A.A
15 2 12 B.B
16 2 14 A.A.A
17 2 14 A.A.B
18 2 13 C.A
19 2 13 C.B
20 2 19 C.B.A
I will pass the MODELID to be copied as a parameter to the Stored Procedure. The tricky part is setting the PARENT column correctly. I think this will need to be done recursively.
Any suggestions?
The solution described here will work correctly in multi-user environment. You don't need to lock the whole table. You don't need to disable self-referencing foreign key. You don't need recursion.
(ab)use MERGE with OUTPUT clause.
MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT. 1=0 is always false, so the NOT MATCHED BY TARGET part is always executed. In general, there could be other branches, see docs. WHEN MATCHED is usually used to UPDATE; WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.
This convoluted form of MERGE is equivalent to simple INSERT, but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need. It allows to retrieve columns from both source and destination tables thus saving a mapping between old and new IDs.
sample data
DECLARE #Items TABLE (
ITEMID int identity,
MODELID int not null,
PARENT int null,
ITEMNUM nvarchar(20) not null
)
INSERT INTO #Items (MODELID, PARENT, ITEMNUM) VALUES
(1, Null, 'A'),
(1, Null, 'B'),
(1, Null, 'C'),
(1, 1 , 'A.A'),
(1, 2 , 'B.B'),
(1, 4 , 'A.A.A'),
(1, 4 , 'A.A.B'),
(1, 3 , 'C.A'),
(1, 3 , 'C.B'),
(1, 9 , 'C.B.A');
I omit the code that duplicates the Model row. Eventually you'll have ID of original Model and new Model.
DECLARE #SrcModelID int = 1;
DECLARE #DstModelID int = 2;
Declare a table variable (or temp table) to hold the mapping between old and new item IDs.
DECLARE #T TABLE(OldItemID int, NewItemID int);
Make a copy of Items remembering the mapping of IDs in the table variable and keeping old PARENT values.
MERGE INTO #Items
USING
(
SELECT ITEMID, PARENT, ITEMNUM
FROM #Items AS I
WHERE MODELID = #SrcModelID
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT (MODELID, PARENT, ITEMNUM)
VALUES
(#DstModelID
,Src.PARENT
,Src.ITEMNUM)
OUTPUT Src.ITEMID AS OldItemID, inserted.ITEMID AS NewItemID
INTO #T(OldItemID, NewItemID)
;
Update old PARENT values with new IDs
WITH
CTE
AS
(
SELECT I.ITEMID, I.PARENT, T.NewItemID
FROM
#Items AS I
INNER JOIN #T AS T ON T.OldItemID = I.PARENT
WHERE I.MODELID = #DstModelID
)
UPDATE CTE
SET PARENT = NewItemID
;
Check the results
SELECT * FROM #Items;
You can do it without recursion. But you need to lock the table first maybe to be sure that it works fine.
insert into items (Modelid, Parent, ITEMNUM)
select 2 as modelId,
MAP.currId as Parent,
MO.ITEMNUM
from (
( select * from items where MODELID = 1) MO
left join
( select IDENT_CURRENT('ITEMS') + ROW_NUMBER() OVER(ORDER BY itemid ) currID ,
i.ItemID
from ITEMS i
where modelid = 1 ) MAP
on MO.Parent= MAP.ItemID
) ORDER BY MO.ItemID
The idea behind it is that we select all rows from original model in ITEM table and we generate fake ID for them.
The fake ID is :
Row 1 = current identity + 1,
Row 2 = current identity + 2,
etc.
After that we have mapping : oldid -> newid
Then we insert original model to ITEM table like it is but we replace Parent by record from our mapping.
The issue that I can see is that some ItemID may still not exist for Parent when we insert rows (ie. we insert row that will have ItemID 20 but its Parent is 21). For that we may need to disable constraint on Parent for the time of execution of this insert. After that we supposed to enable it again. Data will be correct of course.

EAV Select query from spreaded value tables

I have the following SQL Server database structure I have to use to query data. The model could be wrong; I appreciate arguments if that's the case so I can ask for changes. If not, I need a query to get tabbed data in the format I will detail below.
The structure goes like this:
CLIENTS:
ClientID ClientName
-----------------------
1 James
2 Leonard
3 Montgomery
ATTRIBUTES:
AttributeID AttributeName
-----------------------------
1 Rank
2 Date
3 Salary
4 FileRecordsAmount
ATTRIBUTES_STRING:
ClientID AttributeID AttributeStringValue
1 1 Captain
2 1 Chief Surgeon
3 1 Chief Engineer
ATTRIBUTES_NUMERIC:
ClientID AttributeID AttributeNumericValue
1 4 187
2 4 2
3 4 10
The result I need would be the following:
RESULTS:
----------------------------------------------------------
ClientID ClientName Rank FileRecordsAmount
1 James Captain 187
2 Leonard Chief Surgeon 2
3 Montgomery Chief Engineer 10
How can I achieve this?
Thank you very much!
EDIT: The challenging issue here (for me) is that the attributes are dynamic... I have 5 tables of attributes (ATTRIBUTES_STRING, ATTRIBUTES_NUMERIC, ATTRIBUTES_DATE, ATTRIBUTES_BIT, ATTRIBUTES_INT) and the user should be able to set up it's own attributes.
You need an SQL join. It will look something like this:
select
CLIENTS.ClientID,
CLIENTS.ClientName,
ATTRIBUTES_STRING1.AttributeStringValue as Rank,
ATTRIBUTES_NUMERIC2.AttributeNumericValue as FileRecordsAmount
from
CLIENTS,
ATTRIBUTES ATTRIBUTES1,
ATTRIBUTES ATTRIBUTES2,
ATTRIBUTES_STRING ATTRIBUTES_STRING1,
ATTRIBUTES_NUMERIC ATTRIBUTES_NUMERIC2
where CLIENTS.ClientID = ATTRIBUTES_STRING1.ClientID
and CLIENTS.ClientID = ATTRIBUTES_NUMERIC2.ClientID
and ATTRIBUTES_STRING1.AttributeID = ATTRIBUTES1.AttributeID
and ATTRIBUTES_NUMERIC2.AttributeID = ATTRIBUTES2.AttributeID
and ATTRIBUTES1.AttributeName = 'Rank'
and ATTRIBUTES2.AttributeName = 'FileRecordsAmount'
;
Here is the SQL Fiddle for reference. This is my first EAV schema so I wouldn't put too much trust in it :)
Edit: Schema provided below for reference:
create table CLIENTS (
ClientID integer primary key,
ClientName varchar(50) not null
);
insert into CLIENTS values (1,'James');
insert into CLIENTS values (2,'Leonard');
insert into CLIENTS values (3,'Montgomery');
create table ATTRIBUTES (
AttributeID integer primary key,
AttributeName varchar(50) not null
);
create index ATTRIBUTE_NAME_IDX on ATTRIBUTES (AttributeName);
insert into ATTRIBUTES values (1,'Rank');
insert into ATTRIBUTES values (2,'Date');
insert into ATTRIBUTES values (3,'Salary');
insert into ATTRIBUTES values (4,'FileRecordsAmount');
create table ATTRIBUTES_STRING (
ClientID integer,
AttributeID integer not null,
AttributeStringValue varchar(255) not null,
primary key (ClientID, AttributeID)
);
insert into ATTRIBUTES_STRING values (1,1,'Captain');
insert into ATTRIBUTES_STRING values (2,1,'Chief Surgeon');
insert into ATTRIBUTES_STRING values (3,1,'Chief Engineer');
create table ATTRIBUTES_NUMERIC (
ClientID integer,
AttributeID integer not null,
AttributeNumericValue numeric(10, 5) not null,
primary key (ClientID, AttributeID)
);
insert into ATTRIBUTES_NUMERIC values (1,4,187);
insert into ATTRIBUTES_NUMERIC values (2,4,2);
insert into ATTRIBUTES_NUMERIC values (3,4,10);
Edit: Modified the select to make it easier to extend with extra attributes

Duplicating records efficiently in tsql

I've a scenario where I have a parent table which has '1 to many' relationships with two or three tables. These child tables again have '1 to many' relationships with more tables and so on. This goes up to 5 to 6 levels of hierarchy.
Now, based on single primary key value of the parent table, I want to duplicate all information related to it in database. I wrote a stored procedure which uses cursors and inserts child rows one by one and sets new foreign key values with each insert. But it is consuming some time because number of records in child tables is high.
Is there any other efficient way to do this?
In SQL Server 2008:
CREATE TABLE t_parent (id INT NOT NULL PRIMARY KEY IDENTITY, value VARCHAR(100))
CREATE TABLE t_child (id INT NOT NULL PRIMARY KEY IDENTITY, parent INT NOT NULL, value VARCHAR(100))
CREATE TABLE t_grandchild (id INT NOT NULL PRIMARY KEY IDENTITY, child INT NOT NULL, value VARCHAR(100))
INSERT
INTO t_parent (value)
VALUES ('Parent 1')
INSERT
INTO t_parent (value)
VALUES ('Parent 2')
INSERT
INTO t_child (parent, value)
VALUES (1, 'Child 2')
INSERT
INTO t_child (parent, value)
VALUES (2, 'Child 2')
INSERT
INTO t_grandchild (child, value)
VALUES (1, 'Grandchild 1')
INSERT
INTO t_grandchild (child, value)
VALUES (1, 'Grandchild 2')
INSERT
INTO t_grandchild (child, value)
VALUES (2, 'Grandchild 3')
DECLARE #tt TABLE (oid INT, nid INT)
MERGE
INTO t_parent
USING (
SELECT id, value
FROM t_parent
) p
ON 1 = 0
WHEN NOT MATCHED THEN
INSERT (value)
VALUES (value)
OUTPUT p.id, INSERTED.id
INTO #tt;
MERGE
INTO t_child
USING (
SELECT c.id, p.nid, c.value
FROM #tt p
JOIN t_child c
ON c.parent = p.oid
) c
ON 1 = 0
WHEN NOT MATCHED THEN
INSERT (parent, value)
VALUES (nid, value)
OUTPUT c.id, INSERTED.id
INTO #tt;
INSERT
INTO t_grandchild (child, value)
SELECT c.nid, gc.value
FROM #tt c
JOIN t_grandchild gc
ON gc.child = c.oid
In earlier versions of SQL Server, you will have to do a SELECT followed by an INSERT to find out the new values of the PRIMARY KEY.
You'll have to insert one table at a time, but you can do it by inserting sets instead of rows if you allow the FK values in the new parent's child tables to be the same as the FK values of the original parent.
Say you have a view of your parent table and in your sp you limit it to the row to copy from (pk=1, say).
Then insert that row into the parent table substituting PK=2 for the PK val.
Now use a second view of one of the child tables. In your sp, limit the set of rows to those with PK=1. Again, insert all those rows into that same child table substituting PK=2 for the PK field val.