Querying hierarchical data with T-SQL getting some aggregated data - sql

I need to query some parent-child tree data in a single table. The structure of this table is defined by a customers database and cannot be changed.
The result we try to achieve is that we get one result row per table row with the rows data itself plus the path from to to row itself in a display friendly way.
Additionally the Top Most Parent that was used in the hierarchy (see row 8 as an example for an orphaned row, parent does not exist anymore, which can totally be the case at this customers database, as this will not render the data invalid in his use cases) and the most restrictive validity dates in the hierarchy (so the max ValidFrom and min ValidTo, NULL represents no restriction in Validity).
The table is defined like this:
CREATE TABLE [dbo].[HiTest]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [varchar](100) NOT NULL,
[ValidFrom] [datetime] NULL,
[ValidTo] [datetime] NULL,
CONSTRAINT [PK_HiTest]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And the test data:
SET IDENTITY_INSERT [dbo].[HiTest] ON
GO
INSERT [dbo].[HiTest] ([Id], [ParentId], [Name], [ValidFrom], [ValidTo])
VALUES (1, NULL, N'First Level', NULL, NULL),
(2, 1, N'Second Level 1', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(3, 1, N'Second Level 2', NULL, NULL),
(4, 2, N'Third Level 1', CAST(N'2022-02-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(5, 3, N'Third Level 2', CAST(N'2022-03-01T00:00:00.000' AS DateTime), CAST(N'2022-10-31T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(8, 23, N'Orphaned Level', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HiTest] OFF
And the expected result should be:
Can you hint me to the best solution for this scenario (a performant way, there are round about 120000 data rows in this table). At moment, I am playing around with an CTE solution, but I am not getting it to work.
Thank you in advance.
SR

A CTE is going to be the simplest option, given that you have multiple cascading rules that are not simple COALESCE scenarios, that is it not simple nullability checking. Judging by your output, the Agg Valid... range can only be more restrictive than the parent range, so if the parent is valid from April to August of the same year, then the child cannot be valid before April, or after August. It can however be made valid from May to July as this is a more restrictive range than the parent.
Outside of the standard Hierarchy CTE logic the following conditions need to be tracked:
Keep a reference to the original parent record's [ParentId], pass this through unchanged.
[Path] is either the current [Id] if [ParentId] is NULL, if there is no existing parent record but there is a [ParentId] value, then construct the path from the [ParentId]\[Id]. Otherwise append the parent [Path] with a '\' and the child [Id]
[ValidFrom] COALESCE from the Parent (first _non-null) value
can be overridden if the child is greater than the parent
[ValidTo] COALESCE from the Parent (first _non-null) value
can be overridden if the child is less than the parent
At the end of the Recursive CTE, we join back onto the main table to pickup the additional fields to display with the current record:
WITH Hierarchy AS (
SELECT [Id], [ParentId] -- < standard Hierarchy requirements
, [ParentId] AS [Master]
, [Path] = ISNULL(CAST([ParentId] AS VARCHAR(max)) + '\' + CAST([Id] AS VARCHAR(max)), CAST([Id] AS VARCHAR(max)))
, [ValidFrom], [ValidTo]
FROM [HiTest]
WHERE [ParentId] IS NULL OR NOT EXISTS (SELECT 1 FROM [HiTest] lookup WHERE lookup.[id] = [HiTest].[ParentId])
UNION ALL
SELECT [child].[Id], [child].[ParentId]
, [parent].[Master] --> don't change this value
, CONCAT([parent].[Path], '\', CAST([child].[Id] AS VARCHAR(max))) AS [Path]
, CASE
WHEN [parent].[ValidFrom] IS NULL THEN [child].[ValidFrom]
WHEN [child].[ValidFrom] > [parent].[ValidFrom] THEN [child].[ValidFrom]
ELSE [parent].[ValidFrom]
END
, CASE
WHEN [parent].[ValidTo] IS NULL THEN [child].[ValidTo]
WHEN [child].[ValidTo] < [parent].[ValidTo] THEN [child].[ValidTo]
ELSE [parent].[ValidTo]
END
FROM [HiTest] [child]
INNER JOIN Hierarchy parent ON [child].[ParentId] = [parent].[Id]
)
SELECT [HiTest].[Id], [HiTest].[ParentId], [Name], [HiTest].[ValidFrom], [HiTest].[ValidTo], [Path]
, [Hierarchy].[ValidFrom] AS [Agg. ValidFrom]
, [Hierarchy].[ValidTo] AS [Agg. ValidTo]
, [Master] AS [Top Most Parent Id]
FROM [Hierarchy]
INNER JOIN [HiTest] ON [Hierarchy].[Id] = [HiTest].[Id]
ORDER BY [Id]
View this fiddle here: https://dbfiddle.uk/5M6QLC8w
Usually we wouldn't sort a list like this by the child item Id (leaf), it would be more common to ORDER BY [Path] (branch) to visualize the Hierarchy:
Id
ParentId
Name
ValidFrom
ValidTo
Path
Agg. ValidFrom
Agg. ValidTo
Top Most Parent Id
1
null
First Level
null
null
1
null
null
null
2
1
Second Level 1
2022-01-01
2022-12-31
1\2
2022-01-01
2022-12-31
null
4
2
Third Level 1
2022-02-01
2022-12-31
1\2\4
2022-02-01
2022-12-31
null
6
4
Fourth Level 1a
2022-01-01
2022-09-30
1\2\4\6
2022-02-01
2022-09-30
null
7
4
Fourth Level 1b
null
null
1\2\4\7
2022-02-01
2022-12-31
null
3
1
Second Level 2
null
null
1\3
null
null
null
5
3
Third Level 2
2022-03-01
2022-10-31
1\3\5
2022-03-01
2022-10-31
null
8
23
Orphaned Level
2011-01-01
2012-12-31
23\8
2011-01-01
2012-12-31
23

Related

DATEDIFF based on next populated column in row

I am working on a query in SQL Server that is giving me a result set that looks something like this:
ID
DaysInState
DaysInState2
DaysInState3
DaysInState4
1
2022-04-01
2022-04-07
NULL
NULL
2
NULL
2022-04-09
NULL
NULL
3
2022-04-11
2022-04-15
NULL
2022-04-18
4
2022-04-11
NULL
NULL
2022-04-18
I need to calculate the number of days that a given item spent in a given state. The challenge I am facing is 'looking ahead' in the row. Using row 1 as an example these would be the following values:
DaysInState: 6 (DATEDIFF(day, '2022-04-11', '2022-04-07'))
DaysInState2: 12 (DATEDIFF(day, '2022-04-07', GETDATE()))
DaysInState3: NULL
DaysInState4: NULL
The challenging part here is that for each column in each row, I have to look at all the columns to the right of the reference column to see if a date exists to use in DATEDIFF. If a date is not found to the right of the reference column then GETDATE() is used. The table below shows what the result set should look like:
ID
DaysInState
DaysInState2
DaysInState3
DaysInState4
1
6
12
NULL
NULL
2
NULL
10
NULL
NULL
3
4
3
NULL
1
4
7
NULL
NULL
1
I can write fairly convoluted CASE...WHEN statements for each column such that
SELECT
CASE
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState2)
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NULL AND DaysInState3 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState3)
...
END
...
However this isn't very maintainable when states are added / removed. Is there a more dynamic approach to solving this problem that doesn't involve lengthy CASE statements or just generally a "better" approach that maybe I am not seeing?
The COALESCE function allows multiple parameters, evaluating them from left to right, returning the first non-null value, eliminating the need for nesting:
Daysinstate1=
datediff(day,
Daysinstate1,
Coalesce(daysinstate2
,Daysinstate3
,Daysinstate4
,Getdate())
)
If it is possible to adjust the query generating your result set, I'd like to suggest a new approach. One advantage is that it can handle additional DayInState variables (5,6,7,...).
Rewrite your query so that your results have three columns: one for ID, one for "DayInState" number, and one for the date. That is, no NULL values returned. Union the result set with the distinct IDs, an exceedingly large "DayInState" number, and the result of GETDATE(). Then you can use DATEDIFF() with LAG() to look at the next dates.
Here's a working example in SQL Server using your data:
begin
declare #temp table (id int,state_num int,dt date)
insert into #temp values
(1,1,'2022-04-01'),
(1,2,'2022-04-07'),
(2,2,'2022-04-09'),
(3,1,'2022-04-11'),
(3,2,'2022-04-15'),
(3,3,'2022-04-18'),
(4,1,'2022-04-11'),
(4,4,'2022-04-18')
select t.id,t.state_num,DATEDIFF(day,t.dt,LAG(t.dt,1,GETDATE()) over(partition by t.id order by t.state_num desc))
from
(select * from #temp
union (select distinct id,999 as state_num, GETDATE() as dt from #temp) ) t
where t.state_num!=999
order by t.id,t.state_num
end
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[DateTest]') AND type in (N'U'))
DROP TABLE [dbo].[DateTest]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DateTest](
[Id] [int] IDENTITY(1,1) NOT NULL,
[DaysInState] [date] NULL,
[DaysInState2] [date] NULL,
[DaysInState3] [date] NULL,
[DaysInState4] [date] NULL,
CONSTRAINT [PK_DateTest] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [dbo].[DateTest] ([DaysInState],[DaysInState2],[DaysInState3],[DaysInState4]) VALUES
('2022-04-01','2022-04-07',NULL,NULL),
(NULL,'2022-04-09',NULL,NULL),
('2022-04-11','2022-04-15',NULL,'2022-04-18'),
('2022-04-11',NULL,NULL,'2022-04-18');
GO
SELECT [ID],[DaysInState],[DaysInState2],[DaysInState3],[DaysInState4] FROM dbo.DateTest
Use nested ISNULL to check the next column or pass GETDATE().
Using the variable means you can alter the date if needed.
DECLARE #theDate date = GETDATE()
SELECT
[DaysInState] =DATEDIFF(day,[DaysInState], ISNULL([DaysInState2],ISNULL([DaysInState3],ISNULL([DaysInState4],#theDate))))
,[DaysInState2] =DATEDIFF(day,[DaysInState2],ISNULL([DaysInState3],ISNULL([DaysInState4],#theDate)))
,[DaysInState3] =DATEDIFF(day,[DaysInState3],ISNULL([DaysInState4],#theDate))
,[DaysInState4] =DATEDIFF(day,[DaysInState4],#theDate)
FROM dbo.DateTest
Your original query
SELECT
[DaysInState]=CASE
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState2)
WHEN DaysInState IS NOT NULL AND DaysInState2 IS NULL AND DaysInState3 IS NOT NULL THEN DateDiff(day, DaysInState, DaysInState3)
END
FROM dbo.DateTest

Recursively Conditionally get parent Id

I have two tables: Budget Line and Expense. They are structured in such a way that an expense must have either a parent record in the budget line table, or a parent record in the expense table. I need to select all child-most expenses for each budget line.
For example - BudgetLine:
Id
Description
1
TEST123
2
OTHERTEST
Expense:
Id
ParentId
ParentType
Description
1
1
BudgetLine
Group of Expenses
2
1
Expense
Expense # 1
3
1
Expense
Expense # 2
4
2
BudgetLine
Expense 3
Desired result:
BudgetLineId
ExpenseId
Description
1
2
Expense # 1
1
3
Expense # 2
2
4
Expense # 3
I am looking to omit expenses in the result only if they are the only sub-child. Note that an expense may have many children, grandchildren, etc.
I have tried the following, and researching various recursive CTE methods:
WITH RCTE AS
(
SELECT Expense.Id, Expense.ParentId, Expense.ParentType, 1 AS Lvl, Expense.Id as startId FROM Expense
UNION ALL
SELECT rh.Id, rh.ParentId, rh.ParentType, Lvl+1 AS Lvl, rc.Id as startId FROM dbo.Expense rh
INNER JOIN RCTE rc ON rh.Id = rc.ParentId and rc.ParentType = 'Expense'
),
FilteredRCTE AS
(
SELECT startId, MAX(LVL) AS Lvl
FROM RCTE
GROUP BY startID
),
RecursiveData AS
(
SELECT FilteredRCTE.startId AS ExpenseId, RCTE.ParentId AS BudgetLineId
FROM FilteredRCTE
JOIN RCTE ON FilteredRCTE.startId = RCTE.startId AND FilteredRCTE.Lvl = RCTE.Lvl
)
SELECT *
FROM RecursiveData
Which did in-fact obtain all the child Expenses and their associated parent BudgetLine, but it also included the middle-tier expenses (such as item 1 in the example) and I cannot figure out how to filter those middle-tier items out.
Here is a script to create tables / insert sample data:
CREATE TABLE [dbo].[BudgetLine]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](500) NULL,
) ON [PRIMARY]
GO
INSERT INTO dbo.BudgetLine VALUES ('TEST123')
INSERT INTO dbo.BudgetLine VALUES ('OTHERTEST')
CREATE TABLE [dbo].[Expense]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NOT NULL,
[ParentType] [varchar](100) NOT NULL,
[Description] [varchar](max) NULL,
) ON [PRIMARY]
GO
INSERT INTO dbo.Expense VALUES ('1', 'BudgetLine', 'Group of Expenses')
INSERT INTO dbo.Expense VALUES ('1', 'Expense', 'Expense # 1')
INSERT INTO dbo.Expense VALUES ('1', 'Expense', 'Expense # 2')
INSERT INTO dbo.Expense VALUES ('2', 'BudgetLine', 'Expense # 3')
Maybe I have oversimplified, but the following returns your desired results, by checking that there is no other expense row connected to the current row.
WITH RCTE AS
(
SELECT E.Id ExpenseId, E.ParentId, E.ParentType
FROM #Expense E
UNION ALL
SELECT RH.Id, RH.ParentId, RH.ParentType
FROM #Expense RH
INNER JOIN RCTE RC ON RH.Id = RC.ParentId AND RC.ParentType = 'Expense'
)
SELECT *
FROM RCTE R1
WHERE NOT EXISTS (
SELECT 1
FROM RCTE R2
WHERE R2.ParentId = R1.ExpenseId AND R2.ParentType = 'Expense'
);

Write SQL to identify multiple subgroupings within a grouping

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.
For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.
INSERT INTO CompanySum
(
[State]
,[Zone]
,[Company]
,[Code]
,[Revenue]
)
SELECT
--Keys of target
[State]
,[Zone]
--We are expecting to have one distinct combination of these fields per key grouping
,[Company]
,[Code]
--Aggregate
,SUM([Revenue])
FROM COMPANIES
GROUP BY
[State]
,[Zone]
,[Company]
,[Code]
I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:
Data:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
4 CT C Cigna 45678 24
5 CT C Cigna 45678 234
6 MI A GM 48089 100
7 MI A GM 54555 200
8 MI B Chrysler 43434 44
Desired Output:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
6 MI A GM 48089 100
7 MI A GM 54555 200
Here is the DDL and DML needed to create this test scenario
CREATE TABLE [dbo].[Companies](
[RecordNumber] [int] NULL,
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[CompanySum](
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL,
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
DELETE FROM [dbo].[Companies]
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1)))
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1)))
GO
This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.
Please see this SQL Fiddle:
http://sqlfiddle.com/#!18/d0141/1
Is this a solution?
Fiddle: http://sqlfiddle.com/#!18/12e9a0/9
select c.*
from
Companies c
inner join (
select State, Zone
from Companies
group by State, Zone
having count(distinct Company + Code) > 1
) as dup_state_zone
on(
c.State = dup_state_zone.State
and c.Zone = dup_state_zone.Zone
)
Edited - Fix the having clause, with a little cheat...
I used windows ranking function to rank the records by state ordering by zone ascending, to get the desired output.
Suggestion: I would like to say that the insert statement of your CompanySum will ail due to your primary key constraint as you select duplicate key records. in this case you need to change your primary key constraint a little.
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Since State and zone both are with duplicate values this insert will fail. better add a auto increment primary key, or include RecordNumber in to Primary key constraint rather than using State and Zone to make it usnique as there are duplicate values in your desired output.
SELECT
A.[RecordNumber]
,A.[State]
,A.[Zone]
,A.[Company]
,A.Code
,A.Revenue
FROM
(
SELECT *
,RANK() OVER (PARTITION BY [STATE] ORDER BY Zone) AS [row]
FROM Companies
) AS A
WHERE [row] =1
Highlighted are duplicates which will make your insert fail.

CTE arithmetic shift operator causes arithmetic overflow error

When executing a CTE expression to query for an ordered child parent relation by using a shift, it fails with
Arithmetic overflow error converting expression to data type bigint
The problem is that the shift value becomes big very easily. I know I could increase the datatype to support 38 numeric values but I would still hit this number when having deep parent child relations. I'm wondering if there are any other method to order the results, so I would not hit this limit.
Here is a sample script that shows the increase of the shift parameter.
CREATE TABLE [dbo].[ParentChild] (
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [nvarchar](150) NOT NULL
CONSTRAINT [PK_Dialog] PRIMARY KEY CLUSTERED
(
[Id] ASC
))
GO
ALTER TABLE [dbo].[ParentChild] WITH CHECK ADD CONSTRAINT [FK_ParentChild_ParentId] FOREIGN KEY([ParentId])
REFERENCES [dbo].[ParentChild] ([Id])
GO
ALTER TABLE [dbo].[ParentChild] CHECK CONSTRAINT [FK_ParentChild_ParentId]
GO
set identity_insert [dbo].[ParentChild] on
insert into [dbo].[ParentChild] ([Id], [ParentId],[Name])
values
(1, NULL, '1'),
(2, NULL, '2'),
(3, 1, '1.1'),
(4, 1, '1.2'),
(5, 2, '2.1'),
(6, 5, '2.1.1')
set identity_insert [dbo].[ParentChild] off
-- without shift
with Parent as (
select d1.[Id], d1.[ParentId], d1.[Name], 0 AS [Level]
FROM [dbo].[ParentChild] as d1
WHERE d1.[ParentId] IS NULL
UNION ALL
SELECT d2.Id, d2.ParentId, d2.[Name], [Level] + 1
FROM [dbo].[ParentChild] as d2
INNER JOIN Parent d1 ON d1.[Id] = d2.ParentId
)
select p.Id, p.ParentId, p.[Name], [Level]
from Parent p
group by p.Id, p.ParentId, p.[Name], [Level];
-- desired
with Parent as (
select d1.[Id], d1.[ParentId], d1.[Name], 0 AS [Level],
CAST(row_number() over(order by id) as DECIMAL(38,0)) as [shift]
FROM [dbo].[ParentChild] as d1
WHERE d1.[ParentId] IS NULL
UNION ALL
SELECT d2.Id, d2.ParentId, d2.[Name], [Level] + 1,
CAST([shift] * 100 + row_number() over(order by d2.id) as DECIMAL(38,0))
FROM [dbo].[ParentChild] as d2
INNER JOIN Parent d1 ON d1.[Id] = d2.ParentId
)
select p.Id, p.ParentId, p.[Name], [Level], [shift]
from Parent p
group by p.Id, p.ParentId, p.[Name], [Level], [shift]
order by cast([shift] as varchar(50))
Output without the shift parameter
Id ParentId Name Level
1 NULL 1 0
2 NULL 2 0
3 1 1.1 1
4 1 1.2 1
5 2 2.1 1
6 5 2.1.1 2
Output with the shift parameter (desired)
Id ParentId Name Level shift
1 NULL 1 0 1
3 1 1.1 1 101
4 1 1.2 1 102
2 NULL 2 0 2
5 2 2.1 1 201
6 5 2.1.1 2 20101
Assuming we can make shift a string rather than a maths-supporting data type, we can just do this:
with Parent as (
select d1.[Id], d1.[ParentId], d1.[Name], 0 AS [Level],
CONVERT(varchar(max),row_number() over(order by id)) as [shift]
FROM [dbo].[ParentChild] as d1
WHERE d1.[ParentId] IS NULL
UNION ALL
SELECT d2.Id, d2.ParentId, d2.[Name], [Level] + 1,
shift + RIGHT('0' + CONVERT(varchar(2),row_number() over(order by d2.id)),2)
FROM [dbo].[ParentChild] as d2
INNER JOIN Parent d1 ON d1.[Id] = d2.ParentId
)
select p.Id, p.ParentId, p.[Name], [Level], [shift]
from Parent p
group by p.Id, p.ParentId, p.[Name], [Level], [shift]
order by shift
It produces different results if the row numbers can ever exceed 100 but that seems to lead to problems with this representation anyway (ambiguous encodings).

SQL Server 2008: many-to-many relationship: Concatenation in SELECT query [duplicate]

This question already has answers here:
How to make a query with group_concat in sql server [duplicate]
(4 answers)
Closed 4 years ago.
There are 3 tables:
Project
Tool
LinkProjectTool
I need a query that lists everything in the Project table plus an extra column called ProjectTools. This column should contain a comma delimited string with all the tool names belonging to each project.
The data is:
Table Project:
ID Name Client
------------------------
0 table Anna
1 chair Bobby
2 workbench James
3 window Jenny
4 shelves Matthew
Table Tool:
ID Name
------------------------
0 hammer
1 measuring tape
2 pliers
3 scissors
4 spanner
5 saw
6 screwdriver
Table LinkProjectTool:
IDProject IDTool
-------------------
0 0
0 3
2 1
2 4
2 5
The result should be:
ID Name Client ProjectTools
-------------------------------------------------------------
0 table Anna hammer, scissors
1 chair Bobby
2 workbench James measuring tape, spanner, saw
3 window Jenny
4 shelves Matthew
Here are the queries I used to create these tables:
CREATE TABLE [dbo].[Project]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](15) NOT NULL,
[Client] [nvarchar](15) NULL
)
INSERT INTO [dbo].[Project]
(ID, Name, Client)
VALUES
(0, 'table', 'Anna'),
(1, 'chair', 'Bobby'),
(2, 'workbench', 'James'),
(3, 'window', 'Jenny'),
(4, 'shelves', 'Matthew')
CREATE TABLE [dbo].[Tool](
[ID] [tinyint] IDENTITY(0,1) NOT NULL,
[Name] [nvarchar](30) NULL,
CONSTRAINT [PK_Tool] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].Tool
(Name)
VALUES
('hammer'),
('measuring tape'),
('pliers'),
('scissors'),
('spanner'),
('saw'),
('screwdriver')
CREATE TABLE [dbo].LinkProjectTool
(
[IDProject] [int] NOT NULL,
[IDTool] [tinyint] NULL
)
INSERT INTO [dbo].LinkProjectTool
(IDProject, IDTool)
VALUES
(0, 0),
(0, 3),
(2, 1),
(2, 4),
(2, 5)
Could you, please, help?
Thank you.
You can use STUFF function alongside with FOR XML (see this answer for a more detailed explanation on how they work).
Assuming you want the project tools to be separated by a comma and a blank space, you can use the following query:
SELECT DISTINCT p.ID, p.Name, p.Client,
ProjectTools = STUFF((
SELECT ', ' + t.Name
FROM Tool t
WHERE t.ID IN (SELECT IDTool FROM LinkProjectTool WHERE IdProject = p.ID)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
FROM Project p LEFT OUTER JOIN LinkProjectTool lpt ON p.Id = lpt.IDProject
ORDER BY p.ID