Write SQL to identify multiple subgroupings within a grouping - sql

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.
For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.
INSERT INTO CompanySum
(
[State]
,[Zone]
,[Company]
,[Code]
,[Revenue]
)
SELECT
--Keys of target
[State]
,[Zone]
--We are expecting to have one distinct combination of these fields per key grouping
,[Company]
,[Code]
--Aggregate
,SUM([Revenue])
FROM COMPANIES
GROUP BY
[State]
,[Zone]
,[Company]
,[Code]
I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:
Data:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
4 CT C Cigna 45678 24
5 CT C Cigna 45678 234
6 MI A GM 48089 100
7 MI A GM 54555 200
8 MI B Chrysler 43434 44
Desired Output:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
6 MI A GM 48089 100
7 MI A GM 54555 200
Here is the DDL and DML needed to create this test scenario
CREATE TABLE [dbo].[Companies](
[RecordNumber] [int] NULL,
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[CompanySum](
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL,
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
DELETE FROM [dbo].[Companies]
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1)))
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1)))
GO
This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.
Please see this SQL Fiddle:
http://sqlfiddle.com/#!18/d0141/1

Is this a solution?
Fiddle: http://sqlfiddle.com/#!18/12e9a0/9
select c.*
from
Companies c
inner join (
select State, Zone
from Companies
group by State, Zone
having count(distinct Company + Code) > 1
) as dup_state_zone
on(
c.State = dup_state_zone.State
and c.Zone = dup_state_zone.Zone
)
Edited - Fix the having clause, with a little cheat...

I used windows ranking function to rank the records by state ordering by zone ascending, to get the desired output.
Suggestion: I would like to say that the insert statement of your CompanySum will ail due to your primary key constraint as you select duplicate key records. in this case you need to change your primary key constraint a little.
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Since State and zone both are with duplicate values this insert will fail. better add a auto increment primary key, or include RecordNumber in to Primary key constraint rather than using State and Zone to make it usnique as there are duplicate values in your desired output.
SELECT
A.[RecordNumber]
,A.[State]
,A.[Zone]
,A.[Company]
,A.Code
,A.Revenue
FROM
(
SELECT *
,RANK() OVER (PARTITION BY [STATE] ORDER BY Zone) AS [row]
FROM Companies
) AS A
WHERE [row] =1
Highlighted are duplicates which will make your insert fail.

Related

Querying hierarchical data with T-SQL getting some aggregated data

I need to query some parent-child tree data in a single table. The structure of this table is defined by a customers database and cannot be changed.
The result we try to achieve is that we get one result row per table row with the rows data itself plus the path from to to row itself in a display friendly way.
Additionally the Top Most Parent that was used in the hierarchy (see row 8 as an example for an orphaned row, parent does not exist anymore, which can totally be the case at this customers database, as this will not render the data invalid in his use cases) and the most restrictive validity dates in the hierarchy (so the max ValidFrom and min ValidTo, NULL represents no restriction in Validity).
The table is defined like this:
CREATE TABLE [dbo].[HiTest]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [varchar](100) NOT NULL,
[ValidFrom] [datetime] NULL,
[ValidTo] [datetime] NULL,
CONSTRAINT [PK_HiTest]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And the test data:
SET IDENTITY_INSERT [dbo].[HiTest] ON
GO
INSERT [dbo].[HiTest] ([Id], [ParentId], [Name], [ValidFrom], [ValidTo])
VALUES (1, NULL, N'First Level', NULL, NULL),
(2, 1, N'Second Level 1', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(3, 1, N'Second Level 2', NULL, NULL),
(4, 2, N'Third Level 1', CAST(N'2022-02-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(5, 3, N'Third Level 2', CAST(N'2022-03-01T00:00:00.000' AS DateTime), CAST(N'2022-10-31T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(8, 23, N'Orphaned Level', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HiTest] OFF
And the expected result should be:
Can you hint me to the best solution for this scenario (a performant way, there are round about 120000 data rows in this table). At moment, I am playing around with an CTE solution, but I am not getting it to work.
Thank you in advance.
SR
A CTE is going to be the simplest option, given that you have multiple cascading rules that are not simple COALESCE scenarios, that is it not simple nullability checking. Judging by your output, the Agg Valid... range can only be more restrictive than the parent range, so if the parent is valid from April to August of the same year, then the child cannot be valid before April, or after August. It can however be made valid from May to July as this is a more restrictive range than the parent.
Outside of the standard Hierarchy CTE logic the following conditions need to be tracked:
Keep a reference to the original parent record's [ParentId], pass this through unchanged.
[Path] is either the current [Id] if [ParentId] is NULL, if there is no existing parent record but there is a [ParentId] value, then construct the path from the [ParentId]\[Id]. Otherwise append the parent [Path] with a '\' and the child [Id]
[ValidFrom] COALESCE from the Parent (first _non-null) value
can be overridden if the child is greater than the parent
[ValidTo] COALESCE from the Parent (first _non-null) value
can be overridden if the child is less than the parent
At the end of the Recursive CTE, we join back onto the main table to pickup the additional fields to display with the current record:
WITH Hierarchy AS (
SELECT [Id], [ParentId] -- < standard Hierarchy requirements
, [ParentId] AS [Master]
, [Path] = ISNULL(CAST([ParentId] AS VARCHAR(max)) + '\' + CAST([Id] AS VARCHAR(max)), CAST([Id] AS VARCHAR(max)))
, [ValidFrom], [ValidTo]
FROM [HiTest]
WHERE [ParentId] IS NULL OR NOT EXISTS (SELECT 1 FROM [HiTest] lookup WHERE lookup.[id] = [HiTest].[ParentId])
UNION ALL
SELECT [child].[Id], [child].[ParentId]
, [parent].[Master] --> don't change this value
, CONCAT([parent].[Path], '\', CAST([child].[Id] AS VARCHAR(max))) AS [Path]
, CASE
WHEN [parent].[ValidFrom] IS NULL THEN [child].[ValidFrom]
WHEN [child].[ValidFrom] > [parent].[ValidFrom] THEN [child].[ValidFrom]
ELSE [parent].[ValidFrom]
END
, CASE
WHEN [parent].[ValidTo] IS NULL THEN [child].[ValidTo]
WHEN [child].[ValidTo] < [parent].[ValidTo] THEN [child].[ValidTo]
ELSE [parent].[ValidTo]
END
FROM [HiTest] [child]
INNER JOIN Hierarchy parent ON [child].[ParentId] = [parent].[Id]
)
SELECT [HiTest].[Id], [HiTest].[ParentId], [Name], [HiTest].[ValidFrom], [HiTest].[ValidTo], [Path]
, [Hierarchy].[ValidFrom] AS [Agg. ValidFrom]
, [Hierarchy].[ValidTo] AS [Agg. ValidTo]
, [Master] AS [Top Most Parent Id]
FROM [Hierarchy]
INNER JOIN [HiTest] ON [Hierarchy].[Id] = [HiTest].[Id]
ORDER BY [Id]
View this fiddle here: https://dbfiddle.uk/5M6QLC8w
Usually we wouldn't sort a list like this by the child item Id (leaf), it would be more common to ORDER BY [Path] (branch) to visualize the Hierarchy:
Id
ParentId
Name
ValidFrom
ValidTo
Path
Agg. ValidFrom
Agg. ValidTo
Top Most Parent Id
1
null
First Level
null
null
1
null
null
null
2
1
Second Level 1
2022-01-01
2022-12-31
1\2
2022-01-01
2022-12-31
null
4
2
Third Level 1
2022-02-01
2022-12-31
1\2\4
2022-02-01
2022-12-31
null
6
4
Fourth Level 1a
2022-01-01
2022-09-30
1\2\4\6
2022-02-01
2022-09-30
null
7
4
Fourth Level 1b
null
null
1\2\4\7
2022-02-01
2022-12-31
null
3
1
Second Level 2
null
null
1\3
null
null
null
5
3
Third Level 2
2022-03-01
2022-10-31
1\3\5
2022-03-01
2022-10-31
null
8
23
Orphaned Level
2011-01-01
2012-12-31
23\8
2011-01-01
2012-12-31
23

Filter by Id - Exact match or get the default record

Filter the table by exact match if the match is not exist filter by default Id. Consider the following Table:
Table structure:
CREATE TABLE [dbo].[ProductGroupData]
(
[Id] TINYINT NOT NULL,
[Name] NVARCHAR(50) NOT NULL,
CONSTRAINT [PK_dbo_ProductGroupData]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
CREATE TABLE [dbo].[ProductData]
(
[Id] INT NOT NULL,
[GroupId] TINYINT NOT NULL,
[TypeId] TINYINT NOT NULL,
[Product] NVARCHAR(50) NOT NULL,
CONSTRAINT [PK_dbo_ProductData]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
Sample data:
INSERT INTO [dbo].[ProductGroupData] VALUES (1, N'Apple Box');
INSERT INTO [dbo].[ProductGroupData] VALUES (2, N'Orange Box');
INSERT INTO [dbo].[ProductData] VALUES (1, 1, 1, N'Apple #1');
INSERT INTO [dbo].[ProductData] VALUES (2, 1, 3, N'Apple #3');
INSERT INTO [dbo].[ProductData] VALUES (3, 1, 4, N'Apple #4');
INSERT INTO [dbo].[ProductData] VALUES (4, 1, 5, N'Apple #5');
INSERT INTO [dbo].[ProductData] VALUES (5, 2, 1, N'Orange #1');
INSERT INTO [dbo].[ProductData] VALUES (6, 2, 5, N'Orange #5');
The [TypeId] rages from 1 to 5 and the default [TypeId] is 1; If the match is not exist need to return the result with filter [TypeId] is 1 in the Same SELECT statement. Here I projected the table in the explanatory purpose but in actual scenario I used this logic in INNER JOIN
I tried with the following scenarios
Scenario #1:
DECLARE #GroupId TINYINT = 1;
DECLARE #DefaultTypeId TINYINT = 1;
DECLARE #TypeId TINYINT = 2;
SELECT PD.*
FROM [dbo].[ProductGroupData] PGD
INNER JOIN [dbo].[ProductData] PD ON PD.[GroupId] = PGD.[Id]
WHERE PD.[GroupId] = #GroupId
AND (PD.[TypeId] = #TypeId OR PD.[TypeId] = #DefaultTypeId);
Scenario #2:
DECLARE #TypeId TINYINT = 3;
This above statement i.e., Scenario #1 works fine for the missing Id's and the default Id, If I tried the Scenario #2 SELECT statement it returns two rows:
The expected result is
How to perform this in a Single SELECT Statement. Please assist.
You can use the following query:
;WITH CTE AS
(
SELECT ID FROM [dbo].[ProductData] WHERE [GroupId] = #GroupId AND [TypeId] = #TypeId
)
SELECT PD.*
FROM [dbo].[ProductGroupData] PGD
INNER JOIN [dbo].[ProductData] PD ON PD.[GroupId] = PGD.[Id]
WHERE
(PD.ID IN (SELECT ID FROM CTE))
OR
(NOT EXISTS(SELECT 1 FROM CTE) AND PD.[GroupId] = #GroupId AND PD.[TypeId] = #DefaultTypeId)
The first filter (exact match) is applied by the CTE. The second filter (default record) is applied by the main query only when CTE returns nothing.

SQL server query, sort on multiple columns

We have a nested structure of tasks in which every task can contain other tasks. Order of tasks in a task is important and is defined by the Sequence field starting at zero.
Here is my table structure:
USE [MyDB]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Relation](
[PK_ID] [int] IDENTITY(1,1) NOT NULL,
[SourceEntityId] [uniqueidentifier] NOT NULL,
[TargetEntityId] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_Relation] PRIMARY KEY CLUSTERED
(
[PK_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TaskTable1](
[Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](max) NULL,
[SequenceId] [int] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[TaskTable2](
[Id] [uniqueidentifier] NOT NULL,
[Title] [nvarchar](max) NULL,
[SequenceId] [int] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Relation] ON
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (1, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'5b266fd1-cbc8-c16a-91c4-5675a35c9ecf')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (2, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'e499ca68-8103-b8ec-06ba-110fa3f6eb5b')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (4, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'645ad2eb-df10-0d5b-0526-408aad45a145')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (5, N'785227d1-393c-ae18-02e5-03ab08d577af', N'5655aeb7-b8b5-dca9-38af-37687c668c14')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (6, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'030cdefc-0e45-01e6-e2a5-a69e303bda4b')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (7, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'0375c7a1-8cc5-a4c8-151c-966e4af83f73')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (8, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'785227d1-393c-ae18-02e5-03ab08d577af')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (9, N'030cdefc-0e45-01e6-e2a5-a69e303bda4b', N'8324bba9-252f-bef8-c018-8b86491e2361')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (10, N'030cdefc-0e45-01e6-e2a5-a69e303bda4b', N'f1cbe8a3-3285-4cf0-096d-aad0327bdb0b')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (11, N'dab00c89-961c-84dd-bb43-cffd18e63594', N'0189f0af-5045-a498-2d70-99187bf3f0ae')
GO
INSERT [dbo].[Relation] ([PK_ID], [SourceEntityId], [TargetEntityId]) VALUES (12, N'785227d1-393c-ae18-02e5-03ab08d577af', N'ffecd091-c17b-ee5f-a64d-54ea9ff65aa9')
GO
SET IDENTITY_INSERT [dbo].[Relation] OFF
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'5b266fd1-cbc8-c16a-91c4-5675a35c9ecf', N'First', 0)
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'e499ca68-8103-b8ec-06ba-110fa3f6eb5b', N'Second', 1)
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'0189f0af-5045-a498-2d70-99187bf3f0ae', N'Fourth', 3)
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'0375c7a1-8cc5-a4c8-151c-966e4af83f73', N'Sixth', 5)
GO
INSERT [dbo].[TaskTable2] ([Id], [Title], [SequenceId]) VALUES (N'030cdefc-0e45-01e6-e2a5-a69e303bda4b', N'Fifth', 4)
GO
INSERT [dbo].[TaskTable2] ([Id], [Title], [SequenceId]) VALUES (N'785227d1-393c-ae18-02e5-03ab08d577af', N'Seventh', 6)
GO
INSERT [dbo].[TaskTable2] ([Id], [Title], [SequenceId]) VALUES (N'645ad2eb-df10-0d5b-0526-408aad45a145', N'Third', 2)
GO
INSERT [dbo].[TaskTable2] ([Id], [Title], [SequenceId]) VALUES (N'8324bba9-252f-bef8-c018-8b86491e2361', N'sub1', 0)
GO
INSERT [dbo].[TaskTable2] ([Id], [Title], [SequenceId]) VALUES (N'f1cbe8a3-3285-4cf0-096d-aad0327bdb0b', N'sub2', 1)
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'ffecd091-c17b-ee5f-a64d-54ea9ff65aa9', N'sub 1', 0)
GO
INSERT [dbo].[TaskTable1] ([Id], [Title], [SequenceId]) VALUES (N'5655aeb7-b8b5-dca9-38af-37687c668c14', N'sub 2', 1)
GO
To get the tasks in order with their child tasks right beneath their parents, I tried the following query to no avail:
;With TaskCTE
AS
(
select R.SourceEntityId AS ParentTask_Id, R.TargetEntityId AS Task_Id , cast(null as uniqueidentifier) AS ParentTask, 0 AS Level
, ROW_NUMBER() OVER (ORDER BY (SELECT 100)) / power(10.0,0) as x
from Relation R
where (R.SourceEntityId = 'DAB00C89-961C-84DD-BB43-CFFD18E63594')
UNION ALL
select R1.SourceEntityId , R1.TargetEntityId, TaskCTE.Task_Id , Level + 1
, x + ROW_NUMBER() OVER (ORDER BY (SELECT 100)) / power(10.0,level+1)
from Relation R1
INNER JOIN TaskCTE
ON R1.SourceEntityId = TaskCTE.Task_Id
)
select ParentTask_Id, Task_Id, ParentTask, Level
, COALESCE(TT1.Title, TT2.Title) AS Title
, COALESCE(TT1.SequenceId, TT2.SequenceId) AS SequenceId
, x
from TaskCTE
LEFT OUTER JOIN TaskTable1 TT1
ON TaskCTE.Task_Id = TT1.Id
LEFT OUTER JOIN TaskTable2 TT2
ON TaskCTE.Task_Id = TT2.Id
order by level , SequenceId
If you follow the structure of required output (shown in below image), the sequence ** column along with the **Level column must determine the sort order.
Thanks in advance
Edit: My query output which is wrong:
If your problem is that sequence field in other table rather than relation table, then why do not you join them before running recursion? But it likely will be slower than your initial query. Here's a sample
with cte as (
select
r.SourceEntityId, r.TargetEntityId, t.SequenceId, 0 k
from
Relation r
join (
select * from TaskTable1
union all
select * from TaskTable2
) t on r.TargetEntityId = t.id
---------------------------------------
union all select * from cte where k = 1
---------------------------------------
)
, rcte as (
select
SourceEntityId, TargetEntityId, ParentTask = cast(null as uniqueidentifier)
, SequenceId, rn = cast(row_number() over (order by SequenceId) as varchar(8000)), 1 step
from
cte
where
SourceEntityId = 'DAB00C89-961C-84DD-BB43-CFFD18E63594'
union all
select
a.TargetEntityId, b.TargetEntityId, a.SourceEntityId, b.SequenceId
, cast(concat(a.rn, '.', row_number() over (partition by b.SourceEntityId order by b.SequenceId)) as varchar(8000))
, step + 1
from
rcte a
join cte b on a.TargetEntityId = b.SourceEntityId
)
select
*
from
rcte
order by rn
I have not included your X column, I can not get what are trying to calculate. Also, in your expected output values of ParentTask and ParentTask_Id are same. Should be so?
I am using same query as #Uzi with minor correction.I am having same doubts as him.#Yasser should clearly show what output is desire in proper output and remove unnecessary columns.
if row_number only purpose is to order record then why convert it to varchar(8000).Also you can avoid expensive Row_number all together.
Take advantage of PK_ID instead of expensive row_number,even if PK_ID is not in sequence in this case.
If performance is big issue then user should mention number of rows in 3 TABLE AND WHAT OTHER FILTER be applied IN WHERE CONDITION ?
Why data type is uniqueidentifier ?Will it solve the purpose if it is INT?
Read this
Check this query,
WITH cte
AS (
SELECT r.PK_ID
,r.SourceEntityId
,r.TargetEntityId
,t.SequenceId,0 k
FROM #Relation r
JOIN (
SELECT id
,SequenceId
FROM #TaskTable1
UNION ALL
SELECT id
,SequenceId
FROM #TaskTable2
) t ON r.TargetEntityId = t.id
---------------------------------------
--union all select * from cte where k = 1
---------------------------------------
)
,rcte
AS (
SELECT SourceEntityId
,TargetEntityId
,ParentTask = cast(NULL AS UNIQUEIDENTIFIER)
,SequenceId
, rn = cast(row_number() over (order by SequenceId) as decimal(3,1))
--, rn = cast( SequenceId+1 as decimal(3,1))--**
,1 step
FROM cte
WHERE SourceEntityId = 'DAB00C89-961C-84DD-BB43-CFFD18E63594'
UNION ALL
SELECT a.TargetEntityId
,b.TargetEntityId
,a.SourceEntityId
,b.SequenceId
,cast((a.rn+(b.SequenceId/10.0)) as decimal(3,1))
,step + 1
FROM rcte a
JOIN cte b ON a.TargetEntityId = b.SourceEntityId
)
SELECT *
FROM rcte
ORDER BY rn
--**
--SELECT *
--FROM rcte
--ORDER BY rn,st
-- 2nd Edit,
I understand that there is no way of changing database.
In that case it is very logical to create index view where task table id is Clustered index.
select id, SequenceId from #TaskTable1
union all
select id, SequenceId from #TaskTable2
Create nonclustered index NCI_Relation_SourceID on Relation([SourceEntityId])
Create nonclustered index NCI_Relation_TargetEntityId on Relation([TargetEntityId])
you can once try this combination,
Remove PK_ID as clustered index and make TargetEntityId as clustered index.
you can once try creating view on this query,
SELECT r.PK_ID
,r.SourceEntityId
,r.TargetEntityId
,t.SequenceId
FROM #Relation r
JOIN (
SELECT id
,SequenceId
FROM #TaskTable1
UNION ALL
SELECT id
,SequenceId
FROM #TaskTable2
) t ON r.TargetEntityId = t.id
by adding a new column named Hierarchy in CTE expression and sorting outcome according to this value could solve your requirement
Here is the modified CTE query
;With TaskCTE AS
(
select
R.SourceEntityId AS ParentTask_Id,
R.TargetEntityId AS Task_Id , cast(null as uniqueidentifier) AS ParentTask, 0 AS Level
, ROW_NUMBER() OVER (ORDER BY (SELECT 100)) / power(10.0,0) as x
,CAST( ROW_NUMBER() OVER (ORDER BY R.SourceEntityId) as varchar(max)) Hierarchy
from Relation R
where (R.SourceEntityId = 'DAB00C89-961C-84DD-BB43-CFFD18E63594')
UNION ALL
select R1.SourceEntityId , R1.TargetEntityId, TaskCTE.Task_Id , Level + 1
, x + ROW_NUMBER() OVER (ORDER BY (SELECT 100)) / power(10.0,level+1)
,CAST(Hierarchy + ':' + CAST(ROW_NUMBER() OVER (ORDER BY R1.SourceEntityId) as varchar(max)) as varchar(max)) as Hierarchy
from Relation R1
INNER JOIN TaskCTE
ON R1.SourceEntityId = TaskCTE.Task_Id
)
select ParentTask_Id, Task_Id, ParentTask, Level
, COALESCE(TT1.Title, TT2.Title) AS Title
, COALESCE(TT1.SequenceId, TT2.SequenceId) AS SequenceId
, x
,Hierarchy
from TaskCTE
LEFT OUTER JOIN TaskTable1 TT1
ON TaskCTE.Task_Id = TT1.Id
LEFT OUTER JOIN TaskTable2 TT2
ON TaskCTE.Task_Id = TT2.Id
order by Hierarchy
Please note that I have added Hierarchy column and its value is calculated using a ROW_NUMBER() function which creates a unique integer value for each task
You can find implemantation of this hierarchy query with SQL CTE at refereced tutorial
I hope it helps
I am also adding the output as screenshot here to show how data is sorted according to Hierarchy
Although childs are listed after parents, it does not one-to-one match with your desired outcome as I could see

Product Quantity Deduction On Condition

I've a requirement to deduct product quantity on a condition. It seems little bit complicated and not sure how to do it using sql query. Here is the concept on it: Product means here raw material. For production purpose, we have to deduct raw materials from stock. There are few rules to follow:
Table - ProductEntry:
i) Products are purchased with PO (Purchase order) and invoice number from the supplier. There is a condition in this case. Suppose, 100 products for product id 1001 has been purchased and it came into two sections as follows:
Id - ProductId - PO - Invoice no - Quantity - Price - EntryDate
1st section: 1 - 1001 - PO-102 - Inv-122 - 20 - 200 - 2017-07-10 10:00:00
2nd section: 2 - 1001 - PO-102 - Inv-122 - 80 - 800 - 2017-07-10 11:00:00
3 - 1002 - PO-102 - Inv-122 - 20 - 400 - 2017-07-10 10:00:00
Here starts the game. In many cases, the raw material or product may come into multiple sections or fully at a time I mean suppose 100 pieces total.
ii) Now after it has been purchased, it has to get into the store and there is another procedure. Each purchased product should be received with a IP (import permit) number separately as follows:
Table - IpEntry:
Id - ProductId - Invoice no - IP - AnotherProductId
1 - 1001 - Inv-122 - IP2244 - 2
2 - 1001 - Inv-122 - IP2244 - 2
3 - 1002 - Inv-122 - IP2244 - 4
iii) After receiving the products, it should be used for production purpose, that means, there will be consumption. In consumption, the first entered product or raw material should be used. That means, if product id 1001 has to be deducted, then the first entered should be deducted based on 'EntryDate' as it has entered at the min. time 10:00:00 on the same date. So for deduction or consumption, following should take place:
Table - Consumption:
Id - Consumption no - AnotherProductId - Quantity
1 - Con-122 - 2 - 10
3 - Con-122 - 4 - 10
So the final output will be the following:
Id - AnotherProductId - Stock - Quantity Used - Remaining Balance
1 - 2 - 10 - 10 - 100
2 - 4 - 10 - 10 - 200
I am not sharing the sql query here as it is supposed to be not accurate and simple enough using INNER JOIN and MIN functions that returns the following:
Id - AnotherProductId - Stock - Quantity Used - Remaining Balance
1 - 2 - 10 - 10 - 100
2 - 2 - 10 - 10 - 100 //It returns **AnotherProductId or ProductId - 1001 or 2** twice as it should only return once
3 - 4 - 10 - 10 - 200
I am not sure how to do deal with the above scenario specifically same product with different quantity and little bit confused.
Here is the script for better understanding:
USE [Demo]
GO
/****** Object: Table [dbo].[ProductEntry] Script Date: 07/19/2017 20:37:41 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ProductEntry](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductId] [int] NULL,
[PO] [nvarchar](60) NULL,
[Invoice No] [nvarchar](60) NULL,
[Quantity] [float] NULL,
[Price] [float] NULL,
[EntryDate] [datetime] NULL,
CONSTRAINT [PK_ProductEntry] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[ProductEntry] ON
INSERT [dbo].[ProductEntry] ([Id], [ProductId], [PO], [Invoice No], [Quantity], [Price], [EntryDate]) VALUES (1, 1001, N'PO-102', N'Inv-122', 20, 200, CAST(0x0000A7AC00A4CB80 AS DateTime))
INSERT [dbo].[ProductEntry] ([Id], [ProductId], [PO], [Invoice No], [Quantity], [Price], [EntryDate]) VALUES (2, 1001, N'PO-102', N'Inv-122', 80, 800, CAST(0x0000A7AC00B54640 AS DateTime))
INSERT [dbo].[ProductEntry] ([Id], [ProductId], [PO], [Invoice No], [Quantity], [Price], [EntryDate]) VALUES (3, 1002, N'PO-102', N'Inv-122', 20, 400, CAST(0x0000A7AC00A4CB80 AS DateTime))
SET IDENTITY_INSERT [dbo].[ProductEntry] OFF
/****** Object: Table [dbo].[IpEntry] Script Date: 07/19/2017 20:37:41 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[IpEntry](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ProductId] [int] NULL,
[Invoice No] [nvarchar](60) NULL,
[IP] [nvarchar](60) NULL,
[AnotherProductId] [int] NULL,
CONSTRAINT [PK_IpEntry] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[IpEntry] ON
INSERT [dbo].[IpEntry] ([Id], [ProductId], [Invoice No], [IP], [AnotherProductId]) VALUES (1, 1001, N'Inv-122', N'IP2244', 2)
INSERT [dbo].[IpEntry] ([Id], [ProductId], [Invoice No], [IP], [AnotherProductId]) VALUES (2, 1001, N'Inv-122', N'IP2244', 2)
INSERT [dbo].[IpEntry] ([Id], [ProductId], [Invoice No], [IP], [AnotherProductId]) VALUES (3, 1002, N'Inv-122', N'IP2244', 4)
SET IDENTITY_INSERT [dbo].[IpEntry] OFF
/****** Object: Table [dbo].[Consumption] Script Date: 07/19/2017 20:37:41 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Consumption](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Consumption no] [nvarchar](40) NULL,
[AnotherProductId] [int] NULL,
[Quantity] [float] NULL,
CONSTRAINT [PK_Consumption] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Consumption] ON
INSERT [dbo].[Consumption] ([Id], [Consumption no], [AnotherProductId], [Quantity]) VALUES (1, N'Con-122 ', 2, 10)
INSERT [dbo].[Consumption] ([Id], [Consumption no], [AnotherProductId], [Quantity]) VALUES (2, N'Con-122 ', 4, 10)
SET IDENTITY_INSERT [dbo].[Consumption] OFF
This should give you expected result. Please try.
;WITH CTE AS (
select DISTINCT ProductID,AnotherProductId,Balance,
CASE WHEN Balance>=0 THEN 'P' ELSE 'N' END Flag, row_number() over(partition by AnotherProductId order by Balance) RID
FROM (SELECT DISTINCT P.ProductID,I.AnotherProductId,(P.Quantity-C.Quantity) 'Balance'
FROM [ProductEntry] P INNER JOIN [IpEntry] I ON I.ProductID=P.ProductId
INNER JOIN (SELECT [AnotherProductId],SUM([Quantity]) [Quantity] FROM [Consumption] GROUP BY [AnotherProductId]) C ON C.AnotherProductId=I.AnotherProductId
)A
)
select T.AnotherProductId,Balance as Stock, C.Quantity as 'Quantity Used',MIN((P.Price *(P.Quantity-C.Quantity)/P.Quantity)) 'Remaining Balance'
FROM [ProductEntry] P INNER JOIN CTE T ON T.ProductID=P.ProductId AND (RID=1 OR Flag='N')
INNER JOIN (SELECT DISTINCT ProductId,AnotherProductId FROM [IpEntry]) I ON I.ProductID=P.ProductId
INNER JOIN (SELECT [AnotherProductId],SUM([Quantity]) [Quantity] FROM [Consumption] GROUP BY [AnotherProductId]) C ON C.AnotherProductId=I.AnotherProductId
GROUP BY T.AnotherProductId,Balance, C.Quantity
This is expected to cover all the scenarios.
SELECT DISTINCT P.ProductID,P.Quantity,-1 Flag,C.[Quantity] Balance
INTO #TMP
FROM [ProductEntry] P
INNER JOIN [IpEntry] I ON I.ProductID=P.ProductId
INNER JOIN (SELECT [AnotherProductId],SUM([Quantity]) [Quantity] FROM [Consumption] GROUP BY [AnotherProductId])C ON C.AnotherProductId=I.AnotherProductId
DECLARE #Counter INT=1
WHILE((SELECT TOP 1 1 FROM #TMP WHERE Flag=-1 )=1)
BEGIN
UPDATE T SET T.Balance = T.Balance-T.Quantity,
T.Quantity = CASE WHEN T.Quantity-T.Balance>=0 THEN T.Quantity-T.Balance ELSE 0 END,
T.Flag = CASE WHEN T.Quantity-T.Balance>=0 THEN 0 ELSE 1 END
FROM (SELECT ProductId,Quantity,row_number() over (partition by ProductId order by Quantity)RID FROM [ProductEntry])P
INNER JOIN [IpEntry] I ON I.ProductID=P.ProductId and P.RID=#Counter
INNER JOIN (SELECT ProductId,Quantity,Flag,Balance,row_number() over (partition by ProductId order by Quantity)RID FROM #TMP ) T ON T.ProductID=P.ProductID and T.RID=#Counter
INNER JOIN (SELECT [AnotherProductId],SUM([Quantity]) [Quantity] FROM [Consumption] GROUP BY [AnotherProductId])C ON C.AnotherProductId=I.AnotherProductId
UPDATE T1 SET Balance=T2.Balance
FROM #TMP T1 INNER JOIN #TMP T2 ON T1.ProductId=T2.ProductId
WHERE T2.Flag IN (0,1)
UPDATE T1 SET Flag= (SELECT T2.Flag FROM #TMP T2 WHERE T1.ProductId=T2.ProductId AND T2.Flag=0)
FROM #TMP T1
WHERE Flag=0
SET #Counter=#Counter+1
SELECT * FROM #TMP
END
SELECT ProductId,Quantity FROM #TMP --You can add more details by joining with other tables as per your requirement
drop table #TMP

Exclusive-Or in SQL Many to Many Relationship

I have a scenario where I want to find a list of records in a table joined to another through a many to many relationship using an exclusive-or type of relationship. Given the contrived example below, I need a list of categories that are assigned to at least one article, but not all articles. I could brute force this by looping through all of the categories, but that's extremely inefficient. Is there a nice clean way to do this in T-SQL on MS SQL Server?
CREATE TABLE [dbo].[ArticleCategories](
[ArticleId] [int] NOT NULL,
[CategoryId] [int] NOT NULL,
CONSTRAINT [PK_ArticleCategories] PRIMARY KEY CLUSTERED
(
[ArticleId] ASC,
[CategoryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Articles](
[Id] [int] NOT NULL,
[Title] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_Articles] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[Categories](
[Id] [int] NOT NULL,
[CategoryName] [nvarchar](100) NULL,
CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
insert into Articles ( Id, Title ) values ( 1, 'Jon Snow')
insert into Articles ( Id, Title ) values ( 2, 'Joffry Baratheon')
insert into Articles ( Id, Title ) values ( 3, 'Cercei Lanister')
insert into Articles ( Id, Title ) values ( 4, 'Sansa Stark')
insert into Articles ( Id, Title ) values ( 5, 'Khal Drogo')
insert into Articles ( Id, Title ) values ( 6, 'Ramsey Bolton')
insert into Articles ( Id, Title ) values ( 7, 'Melisandre')
insert into Categories ( Id, CategoryName ) values ( 1, 'Orange')
insert into Categories ( Id, CategoryName ) values ( 2, 'Blue')
insert into Categories ( Id, CategoryName ) values ( 3, 'Purple')
insert into Categories ( Id, CategoryName ) values ( 4, 'Green')
insert into Categories ( Id, CategoryName ) values ( 5, 'Violet')
insert into Categories ( Id, CategoryName ) values ( 6, 'Yellow')
insert into Categories ( Id, CategoryName ) values ( 7, 'Black')
insert into ArticleCategories (ArticleId, CategoryId) values (1, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (4, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (6, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 1 )
insert into ArticleCategories (ArticleId, CategoryId) values (2, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (3, 2 )
insert into ArticleCategories (ArticleId, CategoryId) values (5, 3 )
insert into ArticleCategories (ArticleId, CategoryId) values (7, 3 )
In this scenario, the query would not return the category 'Orange' because it is assigned to all of the Articles. It would return 'Blue' and 'Purple' because they are assigned to at least one article, but not all. The other categories will not return at all because they aren't assigned at all.
The expected results would be:
2|Blue
3|Purple
Updated to include sample data and expected output.
The conditions can be tested without joins. The join is only necessary for the category name in the result
SELECT AC.CategoryId, C.CategoryName
FROM
ArticleCategories AC
INNER JOIN Categories C
ON AC.CategoryId = C.CategoryId
GROUP BY AC.CategoryID
HAVING Count(*) < (SELECT Count(*) FROM Articles)
The table ArticleCategories contains only information on groups that have been assigned to an article at least once, therefore no extra condition is required for this.
Since the Primary Key of ArticleCategories includes both columns (ArticleId and CategoryId) there can be no duplicate article per category. Therefore, the count per category is equal to the number of articles this category has been assigned to.
Note that I am using the HAVING-clause, not the WHERE-clause. The WHERE-clause is applied before grouping. The HAVING-clause is applied after grouping and can refer to aggregate results.
Using your sample: http://rextester.com/THCJ13143
and a query using group by and having:
SELECT AC.CategoryID, c.categoryName
FROM ArticleCategories AC
LEFT JOIN Categories C
on C.ID = AC.CategoryID
GROUP BY AC.CategoryID, c.Categoryname
HAVING count(AC.ArticleID) < (SELECT count(*) FROM Articles)
We get:
CategoryID categoryName
2 Blue
3 Purple