Find sql connected component between many to many entities - sql

I have two basic entities: financial plan and purchase request. Theese two entities are in many-to-many relationship:
CREATE TABLE FinancialPlan
(
ID int NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE PurchaseRequest
(
ID int NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE FP_PR
(
FP_ID FOREIGN KEY REFERENCES FinancialPlan(ID),
PR_ID FOREIGN KEY REFERENCES PurchaseRequest(ID)
);
Problem: find all requests, related to specified plan, and all plans, related to requests, related to specified plan, ...
Model could be represented as a graph, where each node represents a plan, or a request, and each edge represents a relationship, then the problem could be rephrased as find connected component, which specified node belongs to.
Example:
Plan Request FP_PR
ID | ID | FP_ID|PR_ID|
----| ----| -----|-----|
1 | 1 | 1 |1 |
2 | 2 | 2 |1 |
3 | 3 | 2 |2 |
4 | 3 |2 |
5 | 4 |2 |
5 |3 |
Find connected component of finplan ID=1
Desired output:
FP_ID | PR_ID|
------+------+
1 | 1 |
2 | 1 |
2 | 2 |
3 | 2 |
4 | 2 |
I am currently doing it recursively on app side, which may generate to many requests and hang the DB server, could this be done with some recursive DB approach?
Visualization:
Starting entity is marked by arrow.
Desired output is circled.

SQL Server solution
I guess the main problem is you need to compare by PR_ID then FP_ID. So in recursive part there must be a CASE statement. On 1 run we take data by FP_ID on second by PR_ID and etc with the help of modulo.
DECLARE #fp int = 1
;WITH cte AS (
SELECT f.FP_ID,
f.PR_ID,
1 as lev
FROM #FP_PR f
WHERE f.FP_id = #fp
UNION ALL
SELECT f.FP_ID,
f.PR_ID,
lev+1
FROM cte c
CROSS JOIN #FP_PR f -- You can use INNER JOIN instead
WHERE CASE (lev+1)%2 WHEN 0 THEN f.PR_ID WHEN 1 THEN f.FP_ID END = CASE (lev+1)%2 WHEN 0 THEN c.PR_ID WHEN 1 THEN c.FP_ID END
AND NOT (f.PR_ID = c.PR_ID AND f.FP_ID = c.FP_ID)
)
SELECT *
FROM cte
Output:
FP_ID PR_ID lev
1 1 1
2 1 2
2 2 3
3 2 4
4 2 4

Related

Find best match in tree given a combination of multiple keys

I have a structure / tree that looks similar to this.
CostType is mandatory and can exist by itself, but it can have a parent ProfitType or Unit and other CostTypes as children.
There can only be duplicate Units. Other cannot appear multiple times in the structure.
| ID | name | parent_id | ProfitType | CostType | Unit |
| -: | ------------- | --------: |
| 1 | Root | (NULL) |
| 2 | 1 | 1 | 300 | | |
| 3 | 1-1 | 2 | | 111 | |
| 4 | 1-1-1 | 3 | | | 8 |
| 5 | 1-2 | 2 | | 222 | |
| 6 | 1-2-1 | 5 | | 333 | |
| 7 | 1-2-1-1 | 6 | | | 8 |
| 8 | 1-2-1-2 | 6 | | | 9 |
Parameters | should RETURN |
(300,111,8) | 4 |
(null,111,8) | 4 |
(null,null,8) | first match, 4 |
(null,222,8) | best match, 5 |
(null,333,null) | 6 |
I am at a loss on how I could create a function that receives (ProfitType, CostType, Unit) and return the best matching ID from the structure.
This isn't giving exactly the answers you provided as example, but see my comment above - if (null,222,8) should be 7 to match how (null,333,8) returns 4 then this is correct.
Also note that I formatted this using temp tables instead of as a function, I don't want to trip a schema change audit so I posted what I have as temp tables, I can rewrite it as a function Monday when my DBA is available, but I thought you might need it before the weekend. Just edit the "DECLARE #ProfitType int = ..." lines to the values you want to test
I also put in quite a few comments because the logic is tricky, but if they aren't enough leave a comment and I can expand my explanation
/*
ASSUMPTIONS:
A tree can be of arbitrary depth, but will not exceed the recursion limit (defaults to 100)
All trees will include at least 1 CostType
All trees will have at most 1 ProfitType
CostType can appear multiple times in a traversal from root to leaf (can units?)
*/
SELECT *
INTO #Temp
FROM (VALUES (1,'Root',NULL, NULL, NULL, NULL)
, (2,'1', 1, 300, NULL, NULL)
, (3,'1-1', 2, NULL, 111, NULL)
, (4,'1-1-1', 3, NULL, NULL, 8)
, (5,'1-2', 2, NULL, 222, NULL)
, (6,'1-2-1', 5, NULL, 333, NULL)
, (7,'1-2-1-1', 6, NULL, NULL, 8)
, (8,'1-2-1-2', 6, NULL, NULL, 9)
) as TempTable(ID, RName, Parent_ID, ProfitType, CostType, UnitID)
--SELECT * FROM #Temp
DECLARE #ProfitType int = NULL--300
DECLARE #CostType INT = 333 --NULL --111
DECLARE #UnitID INT = NULL--8
--SELECT * FROM #Temp
;WITH cteMatches as (
--Start with all nodes that match one criteria, default a score of 100
SELECT N.ID as ReportID, *, 100 as Score, 1 as Depth
FROM #Temp AS N
WHERE N.CostType= #CostType OR N.ProfitType=#ProfitType OR N.UnitID = #UnitID
), cteEval as (
--This is a recursive CTE, it has a (default) limit of 100 recursions
--, but that can be raised if your trees are deeper than 100 nodes
--Start with the base case
SELECT M.ReportID, M.RName, M.ID ,M.Parent_ID, M.Score
, M.Depth, M.ProfitType , M.CostType , M.UnitID
FROM cteMatches as M
UNION ALL
--This is the recursive part, add to the list of matches the match when
--its immediate parent is also considered. For that match increase the score
--if the parent contributes another match. Also update the ID of the match
--to the parent's IDs so recursion can keep adding if more matches are found
SELECT M.ReportID, M.RName, N.ID ,N.Parent_ID
, M.Score + CASE WHEN N.CostType= #CostType
OR N.ProfitType=#ProfitType
OR N.UnitID = #UnitID THEN 100 ELSE 0 END as Score
, M.Depth + 1, N.ProfitType , N.CostType , N.UnitID
FROM cteEval as M INNER JOIN #Temp AS N on M.Parent_ID = N.ID
)SELECT TOP 1 * --Drop the "TOP 1 *" to see debugging info (runners up)
FROM cteEval
ORDER BY SCORE DESC, DEPTH
DROP TABLE #Temp
I'm sorry I don't have enough rep to comment.
You'll have to define "best answer" (for example, why isn't the answer to null,222,8 7 or null instead of 5?), but here's the approach I'd use:
Derive a new table where ProfitType and CostType are listed explicitly instead of only by inheritance. I would approach that by using a cursor (how awful, I know) and following the parent_id until a ProfitType and CostType is found -- or the root is reached. This presumes an unlimited amount of child/grandchild levels for parent_id. If there is a limit, then you can instead use N self joins where N is the number of parent_id levels allowed.
Then you run multiple queries against the derived table. The first query would be for an exact match (and then exit if found). Then next query would be for the "best" partial match (then exit if found), followed by queries for 2nd best, 3rd best, etc. until you've exhausted your "best" match criteria.
If you need nested parent CostTypes to be part of the "best match" criteria, then I would make duplicate entries in the derived table for each row that has multiple CostTypes with a CostType "level". level 1 is the actual CostType. level 2 is that CostType's parent, level 3 etc. Then your best match queries would return multiple rows and you'd need to pick the row with the lowest level (which is the closest parent/grandparent).

Best Way to Join One Column on Columns From Two Other Tables

I have a schema like the following in Oracle
Section:
+--------+----------+
| sec_ID | group_ID |
+--------+----------+
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
+--------+----------+
Section_to_Item:
+--------+---------+
| sec_ID | item_ID |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 2 | 3 |
| 2 | 4 |
+--------+---------+
Item:
+---------+------+
| item_ID | data |
+---------+------+
| 1 | a |
| 2 | b |
| 3 | c |
| 4 | d |
+---------+------+
Item_Version:
+---------+----------+--------+
| item_ID | start_ID | end_ID |
+---------+----------+--------+
| 1 | 1 | |
| 2 | 1 | 3 |
| 3 | 2 | |
| 4 | 1 | 2 |
+---------+----------+--------+
Section_to_Item has FK into Section and Item on the *_ID columns.
Item_version is indexed on item_ID but has no FK to Item.item_ID (ran out of space in the snapshot group).
I have code that receives a list of version IDs and I want to get all items in sections in a given group that are valid for at least one of the versions passed in. If an item has no end_ID, it's valid for anything starting with start_ID. If it has an end_id, it's valid for anything up until (not including) end_ID.
What I currently have is:
SELECT Items.data
FROM Section, Section_to_Items, Item, Item_Version
WHERE Section.group_ID = 1
AND Section_to_Item.sec_ID = Section.sec_ID
AND Item.item_ID = Section_to_Item.item_ID
AND Item.item_ID = Item_Version.item_ID
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that the UNION ALL statement is dynamically generated from the list of passed in versions.
This query currently does a cartesian join and is very slow.
For some reason, if I change the query to join
AND Item_Version.item_ID = Section_to_Item.item_ID
which is not a FK, the query does not do the cartesian join and is much faster.
A) Can anyone explain why this is?
B) Is this the right way to be joining this sequence of tables (I feel weird about joining Item.item_ID to two different tables)
C) Is this the right way to get versions between start_ID and end_ID?
Edit
Same query with inner join syntax:
SELECT Items.data
FROM Item
INNER JOIN Section_to_Items ON Section_to_Items.item_ID = Item.item_ID
INNER JOIN Section ON Section.sec_ID = Section_to_Items.sec_ID
INNER JOIN Item_Version ON Item_Version.item_ID = Item_.item_ID
WHERE Section.group_ID = 1
AND exists (
SELECT *
FROM (
SELECT 2 AS version FROM DUAL
UNION ALL SELECT 3 AS version FROM DUAL
) passed_versions
WHERE Item_Version.start_ID <= passed_versions.version
AND (Item_Version.end_ID IS NULL or Item_Version.end_ID > passed_version.version)
)
Note that in this case the performance difference comes from joining on Item_Version first and then joining Section_to_Item on Item_Version.item_ID.
In terms of table size, Section_to_Item, Item, and Item_Version should be similar (1000s) while Section should be small.
Edit
I just found out that apparently, the schema has no FKs. The FKs specified in the schema configuration files are ignored. They're just there for documentation. So there's no difference between joining on a FK column or not. That being said, by changing the joins into a cascade of SELECT INs, I'm able to avoid joining the entire Item table twice. I don't love the resulting query, and I don't really understand the difference, but the stats indicate it's much less work (changes the A-Rows returned from the inner most scan on Section from 656,000 to 488 (it used to be 656k starts returning 1 row, now it's 488 starts returning 1 row)).
Edit
It turned out to be stale statistics - the two queries were equivalent the whole time but with the incomplete statistics, the DB happened to notice the correct plan only in the second instance. After updating statistics, both queries generated the same plan.
I'm not sure if this is the best idea but this seems to avoid the cartesian join:
select data
from Item
where item_ID in (
select item_ID
from Item_Version
where item_ID in (
select item_ID
from Section_to_Item
where sec_ID in (
select sec_ID
from Section
where group_ID = 1
)
)
and exists (
select 1
from (
select 2 as version
from dual
union all
select 3 as version
from dual
) versions
where versions.version >= start_ID
and (end_ID is null or versions.version <)
)
)

Deleting rows from a table when i pass a string

Hello everyone i want to create a procedure that receives a int and a string with the ID's when they are like this:
Int CompeteID = 1
String TeamIDs = "1,8,9"
Meaning there are 3 TeamIDs, TeamID = 1, TeamID = 8 and TeamID = 9.
Here is my DBModel: https://i.gyazo.com/7920cca8000436cfe207353aaa7d172f.png
So what i want to do is to Insert on TeamCompete, the SportID and CompeteID when there are no equal SportID and CompeteID.
Like this:
TeamCompeteID TeamID CompeteID
1 1 1
4 8 1
5 9 1
6 8 1 <---- Can't do this
But i also want to delete from TeamCompete the TeamIDs i dont pass onto the procedure for example:
TeamCompeteID TeamID CompeteID
1 1 1
2 3 1 <---- Delete this
3 4 1 <---- Delete this
But I don't want to delete the TeamCompete's that are on the Event table...
Example:
EventID TeamCompeteID
5 3 <---- Can't delete TeamCompeteID 3
-- even though i didn't past it on TeamIDs
I hope you understanded my explanation.
First and foremost, passing data in this manner is very much a bad idea. You should really try to normalise your data and have one value in each field.
If however, you cannot avoid this, I would recommend you use a string splitting function such as Jeff Moden's if you are on a SQL Server version prior to 2016 (the string_split is built into SQL Server 2016) which returns a table you can join to from a given string and delimiter.
Using Jeff's function above:
select *
from dbo.DelimitedSplit8K('1,8,9',',')
Results in:
+------------+------+
| ItemNumber | Item |
+------------+------+
| 1 | 1 |
| 2 | 8 |
| 3 | 9 |
+------------+------+
So to use with your table:
declare #t table(CompeteID int,TeamID nvarchar(10));
insert into #t values(1,'1,8,9');
select t.CompeteID
,s.Item
from #t t
cross apply dbo.DelimitedSplit8K(t.TeamID,',') s
Results in:
+-----------+------+
| CompeteID | Item |
+-----------+------+
| 1 | 1 |
| 1 | 8 |
| 1 | 9 |
+-----------+------+
You can then use this table in the rest of your update/insert/delete logic.

Insert into multiple tables

A brief explanation on the relevant domain part:
A Category is composed of four data:
Gender (Male/Female)
Age Division (Mighty Mite to Master)
Belt Color (White to Black)
Weight Division (Rooster to Heavy)
So, Male Adult Black Rooster forms one category. Some combinations may not exist, such as mighty mite black belt.
An Athlete fights Athletes of the same Category, and if he classifies, he fights Athletes of different Weight Divisions (but of the same Gender, Age and Belt).
To the modeling. I have a Category table, already populated with all combinations that exists in the domain.
CREATE TABLE Category (
[Id] [int] IDENTITY(1,1) NOT NULL,
[AgeDivision_Id] [int] NULL,
[Gender] [int] NULL,
[BeltColor] [int] NULL,
[WeightDivision] [int] NULL
)
A CategorySet and a CategorySet_Category, which forms a many to many relationship with Category.
CREATE TABLE CategorySet (
[Id] [int] IDENTITY(1,1) NOT NULL,
[Championship_Id] [int] NOT NULL,
)
CREATE TABLE CategorySet_Category (
[CategorySet_Id] [int] NOT NULL,
[Category_Id] [int] NOT NULL
)
Given the following result set:
| Options_Id | Championship_Id | AgeDivision_Id | BeltColor | Gender | WeightDivision |
|------------|-----------------|----------------|-----------|--------|----------------|
1. | 2963 | 422 | 15 | 7 | 0 | 0 |
2. | 2963 | 422 | 15 | 7 | 0 | 1 |
3. | 2963 | 422 | 15 | 7 | 0 | 2 |
4. | 2963 | 422 | 15 | 7 | 0 | 3 |
5. | 2964 | 422 | 15 | 8 | 0 | 0 |
6. | 2964 | 422 | 15 | 8 | 0 | 1 |
7. | 2964 | 422 | 15 | 8 | 0 | 2 |
8. | 2964 | 422 | 15 | 8 | 0 | 3 |
Because athletes may fight two CategorySets, I need CategorySet and CategorySet_Category to be populated in two different ways (it can be two queries):
One Category_Set for each row, with one CategorySet_Category pointing to the corresponding Category.
One Category_Set that groups all WeightDivisions in one CategorySet in the same AgeDivision_Id, BeltColor, Gender. In this example, only BeltColor varies.
So the final result would have a total of 10 CategorySet rows:
| Id | Championship_Id |
|----|-----------------|
| 1 | 422 |
| 2 | 422 |
| 3 | 422 |
| 4 | 422 |
| 5 | 422 |
| 6 | 422 |
| 7 | 422 |
| 8 | 422 |
| 9 | 422 | /* groups different Weight Division for BeltColor 7 */
| 10 | 422 | /* groups different Weight Division for BeltColor 8 */
And CategorySet_Category would have 16 rows:
| CategorySet_Id | Category_Id |
|----------------|-------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
| 7 | 7 |
| 8 | 8 |
| 9 | 1 | /* groups different Weight Division for BeltColor 7 */
| 9 | 2 | /* groups different Weight Division for BeltColor 7 */
| 9 | 3 | /* groups different Weight Division for BeltColor 7 */
| 9 | 4 | /* groups different Weight Division for BeltColor 7 */
| 10 | 5 | /* groups different Weight Division for BeltColor 8 */
| 10 | 6 | /* groups different Weight Division for BeltColor 8 */
| 10 | 7 | /* groups different Weight Division for BeltColor 8 */
| 10 | 8 | /* groups different Weight Division for BeltColor 8 */
I have no idea how to insert into CategorySet, grab it's generated Id, then use it to insert into CategorySet_Category
I hope I've made my intentions clear.
I've also created a SQLFiddle.
Edit 1: I commented in Jacek's answer that this would run only once, but this is false. It will run a couple of times a week. I have the option to run as SQL Command from C# or a stored procedure. Performance is not crucial.
Edit 2: Jacek suggested using SCOPE_IDENTITY to return the Id. Problem is, SCOPE_IDENTITY returns only the last inserted Id, and I insert more than one row in CategorySet.
Edit 3: Answer to #FutbolFan who asked how the FakeResultSet is retrieved.
It is a table CategoriesOption (Id, Price_Id, MaxAthletesByTeam)
And tables CategoriesOptionBeltColor, CategoriesOptionAgeDivision, CategoriesOptionWeightDivison, CategoriesOptionGender. Those four tables are basically the same (Id, CategoriesOption_Id, Value).
The query look like this:
SELECT * FROM CategoriesOption co
LEFT JOIN CategoriesOptionAgeDivision ON
CategoriesOptionAgeDivision.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionBeltColor ON
CategoriesOptionBeltColor.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionGender ON
CategoriesOptionGender.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionWeightDivision ON
CategoriesOptionWeightDivision.CategoriesOption_Id = co.Id
The solution described here will work correctly in multi-user environment and when destination tables CategorySet and CategorySet_Category are not empty.
I used schema and sample data from your SQL Fiddle.
First part is straight-forward
(ab)use MERGE with OUTPUT clause.
MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT. 1=0 is always false, so the NOT MATCHED BY TARGET part is always executed. In general, there could be other branches, see docs. WHEN MATCHED is usually used to UPDATE; WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.
This convoluted form of MERGE is equivalent to simple INSERT, but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need.
MERGE INTO CategorySet
USING
(
SELECT
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,Category.Id AS Category_Id
FROM
FakeResultSet
INNER JOIN Category ON
Category.AgeDivision_Id = FakeResultSet.AgeDivision_Id AND
Category.Gender = FakeResultSet.Gender AND
Category.BeltColor = FakeResultSet.BeltColor AND
Category.WeightDivision = FakeResultSet.WeightDivision
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(Championship_Id
,Price_Id
,MaxAthletesByTeam)
VALUES
(Src.Championship_Id
,Src.Price_Id
,Src.MaxAthletesByTeam)
OUTPUT inserted.id AS CategorySet_Id, Src.Category_Id
INTO CategorySet_Category (CategorySet_Id, Category_Id)
;
FakeResultSet is joined with Category to get Category.id for each row of FakeResultSet. It is assumed that Category has unique combinations of AgeDivision_Id, Gender, BeltColor, WeightDivision.
In OUTPUT clause we need columns from both source and destination tables. The OUTPUT clause in simple INSERT statement doesn't provide them, so we use MERGE here that does.
The MERGE query above would insert 8 rows into CategorySet and insert 8 rows into CategorySet_Category using generated IDs.
Second part
needs temporary table. I'll use a table variable to store generated IDs.
DECLARE #T TABLE (
CategorySet_Id int
,AgeDivision_Id int
,Gender int
,BeltColor int);
We need to remember the generated CategorySet_Id together with the combination of AgeDivision_Id, Gender, BeltColor that caused it.
MERGE INTO CategorySet
USING
(
SELECT
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,FakeResultSet.AgeDivision_Id
,FakeResultSet.Gender
,FakeResultSet.BeltColor
FROM
FakeResultSet
GROUP BY
FakeResultSet.Championship_Id
,FakeResultSet.Price_Id
,FakeResultSet.MaxAthletesByTeam
,FakeResultSet.AgeDivision_Id
,FakeResultSet.Gender
,FakeResultSet.BeltColor
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
(Championship_Id
,Price_Id
,MaxAthletesByTeam)
VALUES
(Src.Championship_Id
,Src.Price_Id
,Src.MaxAthletesByTeam)
OUTPUT
inserted.id AS CategorySet_Id
,Src.AgeDivision_Id
,Src.Gender
,Src.BeltColor
INTO #T(CategorySet_Id, AgeDivision_Id, Gender, BeltColor)
;
The MERGE above would group FakeResultSet as needed and insert 2 rows into CategorySet and 2 rows into #T.
Then join #T with Category to get Category.IDs:
INSERT INTO CategorySet_Category (CategorySet_Id, Category_Id)
SELECT
TT.CategorySet_Id
,Category.Id AS Category_Id
FROM
#T AS TT
INNER JOIN Category ON
Category.AgeDivision_Id = TT.AgeDivision_Id AND
Category.Gender = TT.Gender AND
Category.BeltColor = TT.BeltColor
;
This will insert 8 rows into CategorySet_Category.
Here is not the full answer, but direction which you can use to solve this:
1st query:
select row_number() over(order by t, Id) as n, Championship_Id
from (
select distinct 0 as t, b.Id, a.Championship_Id
from FakeResultSet as a
inner join
Category as b
on
a.AgeDivision_Id=b.AgeDivision_Id and
a.Gender=b.Gender and
a.BeltColor=b.BeltColor and
a.WeightDivision=b.WeightDivision
union all
select distinct 1, BeltColor, Championship_Id
from FakeResultSet
) as q
2nd query:
select q2.CategorySet_Id, c.Id as Category_Id from (
select row_number() over(order by t, Id) as CategorySet_Id, Id, BeltColor
from (
select distinct 0 as t, b.Id, null as BeltColor
from FakeResultSet as a
inner join
Category as b
on
a.AgeDivision_Id=b.AgeDivision_Id and
a.Gender=b.Gender and
a.BeltColor=b.BeltColor and
a.WeightDivision=b.WeightDivision
union all
select distinct 1, BeltColor, BeltColor
from FakeResultSet
) as q
) as q2
inner join
Category as c
on
(q2.BeltColor is null and q2.Id=c.Id)
OR
(q2.BeltColor = c.BeltColor)
of course this will work only for empty CategorySet and CategorySet_Category tables, but you can use select coalese(max(Id), 0) from CategorySet to get current number and add it to row_number, thus you will get real ID which will be inserted into CategorySet row for second query
What I do when I run into these situations is to create one or many temporary tables with row_number() over clauses giving me identities on the temporary tables. Then I check for the existence of each record in the actual tables, and if they exist update the temporary table with the actual record ids. Finally I run a while exists loop on the temporary table records missing the actual id and insert them one at a time, after the insert I update the temporary table record with the actual ids. This lets you work through all the data in a controlled manner.
##IDENTITY is your friend to the 2nd part of question
https://msdn.microsoft.com/en-us/library/ms187342.aspx
and
Best way to get identity of inserted row?
Some API (drivers) returns int from update() function, i.e. ID if it is "insert". What API/environment do You use?
I don't understand 1st problem. You should not insert identity column.
Below query will give final result For CategorySet rows:
SELECT
ROW_NUMBER () OVER (PARTITION BY Championship_Id ORDER BY Championship_Id) RNK,
Championship_Id
FROM
(
SELECT
Championship_Id
,BeltColor
FROM #FakeResultSet
UNION ALL
SELECT
Championship_Id,BeltColor
FROM #FakeResultSet
GROUP BY Championship_Id,BeltColor
)BASE

SQL Server: Select hierarchically related items from one table

Say, I have an organizational structure that is 5 levels deep:
CEO -> DeptHead -> Supervisor -> Foreman -> Worker
The hierarchy is stored in a table Position like this:
PositionId | PositionCode | ManagerId
1 | CEO | NULL
2 | DEPT01 | 1
3 | DEPT02 | 1
4 | SPRV01 | 2
5 | SPRV02 | 2
6 | SPRV03 | 3
7 | SPRV04 | 3
... | ... | ...
PositionId is uniqueidentifier. ManagerId is the ID of employee's manager, referring PositionId from the same table.
I need a SQL query to get the hierarchy tree going down from a position, provided as parameter, including the position itself. I managed to develop this:
-- Select the original position itself
SELECT
'Rank' = 0,
Position.PositionCode
FROM Position
WHERE Position.PositionCode = 'CEO' -- Parameter
-- Select the subordinates
UNION
SELECT DISTINCT
'Rank' =
CASE WHEN Pos2.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos3.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos4.PositionCode IS NULL THEN 0 ELSE 1+
CASE WHEN Pos5.PositionCode IS NULL THEN 0 ELSE 1
END
END
END
END,
'PositionCode' = RTRIM(ISNULL(Pos5.PositionCode, ISNULL(Pos4.PositionCode, ISNULL(Pos3.PositionCode, Pos2.PositionCode)))),
FROM Position Pos1
LEFT JOIN Position Pos2
ON Pos1.PositionId = Pos2.ManagerId
LEFT JOIN Position Pos3
ON Pos2.PositionId = Pos3.ManagerId
LEFT JOIN Position Pos4
ON Pos3.PositionId = Pos4.ManagerId
LEFT JOIN Position Pos5
ON Pos4.PositionId = Pos5.ManagerId
WHERE Pos1.PositionCode = 'CEO' -- Parameter
ORDER BY Rank ASC
It works not only for 'CEO' but for any position, displaying its subordinates. Which gives me the following output:
Rank | PositionCode
0 | CEO
... | ...
2 | SPRV55
2 | SPRV68
... | ...
3 | FRMN10
3 | FRMN12
... | ...
4 | WRKR01
4 | WRKR02
4 | WRKR03
4 | WRKR04
My problems are:
The output does not include intermediate nodes - it will only output end nodes, i.e. workers and intermediate managers which have no subordinates. I need all intermediate managers as well.
I have to manually UNION the row with original position on top of the output. I there any more elegant way to do this?
I want the output to be sorted in hieararchical tree order. Not all DeptHeads, then all Supervisors, then all Foremen then all workers, but like this:
Rank | PositionCode
0 | CEO
1 | DEPT01
2 | SPRV01
3 | FRMN01
4 | WRKR01
4 | WRKR02
... | ...
3 | FRMN02
4 | WRKR03
4 | WRKR04
... | ...
Any help would be greatly appreciated.
Try a recursive CTE, the example on TechNet is almost identical to your problem I believe:
http://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx
Thx, everyone suggesting CTE. I got the following code and it's working okay:
WITH HierarchyTree (PositionId, PositionCode, Rank)
AS
(
-- Anchor member definition
SELECT PositionId, PositionCode,
0 AS Rank
FROM Position AS e
WHERE PositionCode = 'CEO'
UNION ALL
-- Recursive member definition
SELECT e.PositionId, e.PositionCode,
Rank + 1
FROM Position AS e
INNER JOIN HierarchyTree AS d
ON e.ManagerId = d.PositionId
)
SELECT Rank, PositionCode
FROM HierarchyTree
GO
I had a similar problem to yours on a recent project but with a variable recursion length - typically between 1 and 10 levels.
I wanted to simplify the SQL side of things so I put some extra work into the logic of storing the recursive elements by storing a "hierarchical path" in addition to the direct manager Id.
So a very contrived example:
Employee
Id | JobDescription | Hierarchy | ManagerId
1 | DIRECTOR | 1\ | NULL
2 | MANAGER 1 | 1\2\ | 1
3 | MANAGER 2 | 1\3\ | 1
4 | SUPERVISOR 1 | 1\2\4 | 2
5 | SUPERVISOR 2 | 1\3\5 | 3
6 | EMPLOYEE 1 | 1\2\4\6 | 4
7 | EMPLOYEE 2 | 1\3\5\7 | 5
This means you have the power to very quickly query any level of the tree and get all descendants by using a LIKE query on the Hierarchy column
For example
SELECT * FROM dbo.Employee WHERE Hierarchy LIKE '\1\2\%'
would return
MANAGER 1
SUPERVISOR 1
EMPLOYEE 1
Additionally you can also easily get one level of the tree by using the ManagerId column.
The downside to this approach is you have to construct the hierarchy when inserting or updating records but believe me when I say this storage structure saved me a lot of pain later on without the need for unnecessary query complexity.
One thing to note is that my approach gives you the raw data - I then parse the result set into a recursive strongly typed structure in my services layer. As a rule I don't tend to format output in SQL.