SQL Server 2008: many-to-many relationship: Concatenation in SELECT query [duplicate] - sql

This question already has answers here:
How to make a query with group_concat in sql server [duplicate]
(4 answers)
Closed 4 years ago.
There are 3 tables:
Project
Tool
LinkProjectTool
I need a query that lists everything in the Project table plus an extra column called ProjectTools. This column should contain a comma delimited string with all the tool names belonging to each project.
The data is:
Table Project:
ID Name Client
------------------------
0 table Anna
1 chair Bobby
2 workbench James
3 window Jenny
4 shelves Matthew
Table Tool:
ID Name
------------------------
0 hammer
1 measuring tape
2 pliers
3 scissors
4 spanner
5 saw
6 screwdriver
Table LinkProjectTool:
IDProject IDTool
-------------------
0 0
0 3
2 1
2 4
2 5
The result should be:
ID Name Client ProjectTools
-------------------------------------------------------------
0 table Anna hammer, scissors
1 chair Bobby
2 workbench James measuring tape, spanner, saw
3 window Jenny
4 shelves Matthew
Here are the queries I used to create these tables:
CREATE TABLE [dbo].[Project]
(
[ID] [int] NOT NULL,
[Name] [nvarchar](15) NOT NULL,
[Client] [nvarchar](15) NULL
)
INSERT INTO [dbo].[Project]
(ID, Name, Client)
VALUES
(0, 'table', 'Anna'),
(1, 'chair', 'Bobby'),
(2, 'workbench', 'James'),
(3, 'window', 'Jenny'),
(4, 'shelves', 'Matthew')
CREATE TABLE [dbo].[Tool](
[ID] [tinyint] IDENTITY(0,1) NOT NULL,
[Name] [nvarchar](30) NULL,
CONSTRAINT [PK_Tool] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [dbo].Tool
(Name)
VALUES
('hammer'),
('measuring tape'),
('pliers'),
('scissors'),
('spanner'),
('saw'),
('screwdriver')
CREATE TABLE [dbo].LinkProjectTool
(
[IDProject] [int] NOT NULL,
[IDTool] [tinyint] NULL
)
INSERT INTO [dbo].LinkProjectTool
(IDProject, IDTool)
VALUES
(0, 0),
(0, 3),
(2, 1),
(2, 4),
(2, 5)
Could you, please, help?
Thank you.

You can use STUFF function alongside with FOR XML (see this answer for a more detailed explanation on how they work).
Assuming you want the project tools to be separated by a comma and a blank space, you can use the following query:
SELECT DISTINCT p.ID, p.Name, p.Client,
ProjectTools = STUFF((
SELECT ', ' + t.Name
FROM Tool t
WHERE t.ID IN (SELECT IDTool FROM LinkProjectTool WHERE IdProject = p.ID)
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '')
FROM Project p LEFT OUTER JOIN LinkProjectTool lpt ON p.Id = lpt.IDProject
ORDER BY p.ID

Related

Querying hierarchical data with T-SQL getting some aggregated data

I need to query some parent-child tree data in a single table. The structure of this table is defined by a customers database and cannot be changed.
The result we try to achieve is that we get one result row per table row with the rows data itself plus the path from to to row itself in a display friendly way.
Additionally the Top Most Parent that was used in the hierarchy (see row 8 as an example for an orphaned row, parent does not exist anymore, which can totally be the case at this customers database, as this will not render the data invalid in his use cases) and the most restrictive validity dates in the hierarchy (so the max ValidFrom and min ValidTo, NULL represents no restriction in Validity).
The table is defined like this:
CREATE TABLE [dbo].[HiTest]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [varchar](100) NOT NULL,
[ValidFrom] [datetime] NULL,
[ValidTo] [datetime] NULL,
CONSTRAINT [PK_HiTest]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
And the test data:
SET IDENTITY_INSERT [dbo].[HiTest] ON
GO
INSERT [dbo].[HiTest] ([Id], [ParentId], [Name], [ValidFrom], [ValidTo])
VALUES (1, NULL, N'First Level', NULL, NULL),
(2, 1, N'Second Level 1', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(3, 1, N'Second Level 2', NULL, NULL),
(4, 2, N'Third Level 1', CAST(N'2022-02-01T00:00:00.000' AS DateTime), CAST(N'2022-12-31T00:00:00.000' AS DateTime)),
(5, 3, N'Third Level 2', CAST(N'2022-03-01T00:00:00.000' AS DateTime), CAST(N'2022-10-31T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(6, 4, N'Fourth Level 1a', CAST(N'2022-01-01T00:00:00.000' AS DateTime), CAST(N'2022-09-30T00:00:00.000' AS DateTime)),
(8, 23, N'Orphaned Level', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HiTest] OFF
And the expected result should be:
Can you hint me to the best solution for this scenario (a performant way, there are round about 120000 data rows in this table). At moment, I am playing around with an CTE solution, but I am not getting it to work.
Thank you in advance.
SR
A CTE is going to be the simplest option, given that you have multiple cascading rules that are not simple COALESCE scenarios, that is it not simple nullability checking. Judging by your output, the Agg Valid... range can only be more restrictive than the parent range, so if the parent is valid from April to August of the same year, then the child cannot be valid before April, or after August. It can however be made valid from May to July as this is a more restrictive range than the parent.
Outside of the standard Hierarchy CTE logic the following conditions need to be tracked:
Keep a reference to the original parent record's [ParentId], pass this through unchanged.
[Path] is either the current [Id] if [ParentId] is NULL, if there is no existing parent record but there is a [ParentId] value, then construct the path from the [ParentId]\[Id]. Otherwise append the parent [Path] with a '\' and the child [Id]
[ValidFrom] COALESCE from the Parent (first _non-null) value
can be overridden if the child is greater than the parent
[ValidTo] COALESCE from the Parent (first _non-null) value
can be overridden if the child is less than the parent
At the end of the Recursive CTE, we join back onto the main table to pickup the additional fields to display with the current record:
WITH Hierarchy AS (
SELECT [Id], [ParentId] -- < standard Hierarchy requirements
, [ParentId] AS [Master]
, [Path] = ISNULL(CAST([ParentId] AS VARCHAR(max)) + '\' + CAST([Id] AS VARCHAR(max)), CAST([Id] AS VARCHAR(max)))
, [ValidFrom], [ValidTo]
FROM [HiTest]
WHERE [ParentId] IS NULL OR NOT EXISTS (SELECT 1 FROM [HiTest] lookup WHERE lookup.[id] = [HiTest].[ParentId])
UNION ALL
SELECT [child].[Id], [child].[ParentId]
, [parent].[Master] --> don't change this value
, CONCAT([parent].[Path], '\', CAST([child].[Id] AS VARCHAR(max))) AS [Path]
, CASE
WHEN [parent].[ValidFrom] IS NULL THEN [child].[ValidFrom]
WHEN [child].[ValidFrom] > [parent].[ValidFrom] THEN [child].[ValidFrom]
ELSE [parent].[ValidFrom]
END
, CASE
WHEN [parent].[ValidTo] IS NULL THEN [child].[ValidTo]
WHEN [child].[ValidTo] < [parent].[ValidTo] THEN [child].[ValidTo]
ELSE [parent].[ValidTo]
END
FROM [HiTest] [child]
INNER JOIN Hierarchy parent ON [child].[ParentId] = [parent].[Id]
)
SELECT [HiTest].[Id], [HiTest].[ParentId], [Name], [HiTest].[ValidFrom], [HiTest].[ValidTo], [Path]
, [Hierarchy].[ValidFrom] AS [Agg. ValidFrom]
, [Hierarchy].[ValidTo] AS [Agg. ValidTo]
, [Master] AS [Top Most Parent Id]
FROM [Hierarchy]
INNER JOIN [HiTest] ON [Hierarchy].[Id] = [HiTest].[Id]
ORDER BY [Id]
View this fiddle here: https://dbfiddle.uk/5M6QLC8w
Usually we wouldn't sort a list like this by the child item Id (leaf), it would be more common to ORDER BY [Path] (branch) to visualize the Hierarchy:
Id
ParentId
Name
ValidFrom
ValidTo
Path
Agg. ValidFrom
Agg. ValidTo
Top Most Parent Id
1
null
First Level
null
null
1
null
null
null
2
1
Second Level 1
2022-01-01
2022-12-31
1\2
2022-01-01
2022-12-31
null
4
2
Third Level 1
2022-02-01
2022-12-31
1\2\4
2022-02-01
2022-12-31
null
6
4
Fourth Level 1a
2022-01-01
2022-09-30
1\2\4\6
2022-02-01
2022-09-30
null
7
4
Fourth Level 1b
null
null
1\2\4\7
2022-02-01
2022-12-31
null
3
1
Second Level 2
null
null
1\3
null
null
null
5
3
Third Level 2
2022-03-01
2022-10-31
1\3\5
2022-03-01
2022-10-31
null
8
23
Orphaned Level
2011-01-01
2012-12-31
23\8
2011-01-01
2012-12-31
23

Azure SQL: Delete User records with redundant nested references

I have a quite complex requirement to delete UserSubscriptions records with redundant options, meaning if a UserId is associated with two or more subscriptions with same options I need to keep the first subscription and then delete the rest of the subscriptions, below are some of the scenarions.
Scenario 1: UserId 1 has three subscriptions (SubscriptionId 1, 2 and 3) and all of the subscriptions have same options (Email, Call, Fax) so for UserId 1 so the SubscriptionOptionIds 4,5,6,7,8,9 have to be deleted from UserSubscriptions
Scenario 2: UserId 2 has two subscriptions (SubscriptionId 1 and 2) and both the subscriptions DOES NOT have same options in this case nothing needs to be deleted
Scenarion 3: UserId 3 has two subscriptions (SubscriptionId 1 and 2) and both the subscriptions have same option (Email) so for UserId 3 so the SubscriptionOptionId 3 has to be deleted from UserSubscriptions
Below is my Table's DDL+DML
CREATE TABLE [Options](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [Options] ADD CONSTRAINT [PK_Options] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE TABLE [Subscriptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](255) NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [Subscriptions] ADD CONSTRAINT [PK_Subscriptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
CREATE TABLE [SubscriptionsOptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[OptionId] [int] NOT NULL,
[SubscriptionId] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [SubscriptionsOptions] ADD CONSTRAINT [PK_SubscriptionsOptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
ALTER TABLE [SubscriptionsOptions] WITH CHECK ADD CONSTRAINT [FK_SubscriptionsOptions.SubscriptionId_Subscriptions.Id] FOREIGN KEY([SubscriptionId])
REFERENCES [Subscriptions] ([Id])
GO
ALTER TABLE [SubscriptionsOptions] CHECK CONSTRAINT [FK_SubscriptionsOptions.SubscriptionId_Subscriptions.Id]
GO
ALTER TABLE [SubscriptionsOptions] WITH CHECK ADD CONSTRAINT [FK_SubscriptionsOptions.OptionId_Options.Id] FOREIGN KEY([OptionId])
REFERENCES [Options] ([Id])
GO
ALTER TABLE [SubscriptionsOptions] CHECK CONSTRAINT [FK_SubscriptionsOptions.OptionId_Options.Id]
GO
CREATE TABLE [UserSubscriptions](
[Id] [int] IDENTITY(1,1) NOT NULL,
[SubscriptionsOptionsId] [int] NOT NULL,
[userid] [int] NOT NULL
) ON [PRIMARY]
GO
ALTER TABLE [UserSubscriptions] ADD CONSTRAINT [PK_UserSubscriptions] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
ALTER TABLE [UserSubscriptions] WITH CHECK ADD CONSTRAINT [FK_UserSubscriptions.SubscriptionsOptionsId_SubscriptionsOptions.Id] FOREIGN KEY([SubscriptionsOptionsId])
REFERENCES [SubscriptionsOptions] ([Id])
GO
ALTER TABLE [UserSubscriptions] CHECK CONSTRAINT [FK_UserSubscriptions.SubscriptionsOptionsId_SubscriptionsOptions.Id]
GO
INSERT INTO Options VALUES ('E-mail');
INSERT INTO Options VALUES ('Call');
INSERT INTO Options VALUES ('Fax');
INSERT INTO Subscriptions VALUES ('Promo1');
INSERT INTO Subscriptions VALUES ('Promo2');
INSERT INTO Subscriptions VALUES ('Promo3');
INSERT INTO SubscriptionsOptions VALUES (1,1);
INSERT INTO SubscriptionsOptions VALUES (1,2);
INSERT INTO SubscriptionsOptions VALUES (1,3);
INSERT INTO SubscriptionsOptions VALUES (2,1);
INSERT INTO SubscriptionsOptions VALUES (2,2);
INSERT INTO SubscriptionsOptions VALUES (2,3);
INSERT INTO SubscriptionsOptions VALUES (3,1);
INSERT INTO SubscriptionsOptions VALUES (3,2);
INSERT INTO SubscriptionsOptions VALUES (3,3);
INSERT INTO UserSubscriptions VALUES (1,1);
INSERT INTO UserSubscriptions VALUES (1,2);
INSERT INTO UserSubscriptions VALUES (1,3);
INSERT INTO UserSubscriptions VALUES (1,4);
INSERT INTO UserSubscriptions VALUES (1,5);
INSERT INTO UserSubscriptions VALUES (1,6);
INSERT INTO UserSubscriptions VALUES (1,7);
INSERT INTO UserSubscriptions VALUES (1,8);
INSERT INTO UserSubscriptions VALUES (1,9);
INSERT INTO UserSubscriptions VALUES (2,1);
INSERT INTO UserSubscriptions VALUES (2,2);
INSERT INTO UserSubscriptions VALUES (2,3);
INSERT INTO UserSubscriptions VALUES (2,4);
INSERT INTO UserSubscriptions VALUES (2,5);
INSERT INTO UserSubscriptions VALUES (3,1);
INSERT INTO UserSubscriptions VALUES (3,3);
Options
1 E-mail
2 Call
3 Fax
Subscriptions
1 Promo1
2 Promo2
3 Promo3
SubscriptionsOptions
id SubscriptionId OptionId
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
UserSubscriptions
userid SubscriptionOptionId
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
2 1
2 2
2 3
2 4
2 5
3 1
3 3
The final output of UserSubscriptions has to be like below, am having hard time for the deletion script for the above mentioned scenarios, I really appreciate if someone can help me with the deletion script
userid SubscriptionOptionId
1 1
1 2
1 3
2 1
2 2
2 3
2 4
2 5
3 1
Your requirement does not seem to match your expected output.
Your requirement says "delete UserSubscriptions records with redundant options, meaning if a UserId is associated with two or more subscriptions with same options I need to keep the first subscription and then delete the rest of the subscriptions". But in your expected output that is not satisfied.
For example, in your expected output you have deleted subscriptionOptionId 4 from userId 1, but you have not deleted it from userId 2. Why not?
Conversely, for userId 3 you have deleted subscriptionOptionId 3 from userSubscrptions, but subscriptionOptionId 1 has optionId 1, whereas subscriptionOptionId 3 has optionId 3. These are different options, so according to your requirement neither should have been deleted.
A query which actually deletes rows according to your stated requirement (but which will not produce your expected result) would be:
delete sub
from userSubscriptions sub
join SubscriptionsOptions opt on opt.id = sub.SubscriptionsOptionsId
where exists
(
select *
from userSubscriptions prevSub
join SubscriptionsOptions prevOpt on prevOpt.Id = prevSub.SubscriptionsOptionsId
where prevSub.userId = sub.userId
and prevSub.id < sub.id
)
Having said that, if I think about what this is modelling a little bit, I think the semantics of your requirements as described aren't really what you want. If I, as a user, am subscribed to two different promotions, it seems sensible that I could have both of those promotions use the same delivery option.
However, what would be redundant would be a situation where I am subscribed to a promotion, and the database says that I want that promo sent to me in a particular way, and it says it twice. For example, "allmhuran is subscribed to promo1 and wants it delivered by email", and "allmhuran is subscribed to promo1 and wants it delivered by email".
But there are no such redundancies in your sample data, because there are no redundancies in your subscriptionsOptions table. Every row is a unique combination of {subscriptionId, optionId}.
On the off chance that this is what you actually wanted to do, the delete statement only changes slightly:
delete sub
from userSubscriptions sub
join SubscriptionsOptions opt on opt.id = sub.SubscriptionsOptionsId
where exists
(
select *
from userSubscriptions prevSub
join SubscriptionsOptions prevOpt on prevOpt.Id = prevSub.SubscriptionsOptionsId
where prevSub.userId = sub.userId
and prevOpt.subscriptionId = opt.subscriptionId -- this condition was not in the previous statement
and prevSub.id < sub.id
)
If that is the intention, then I would also say that the existing schema is "overnormalized". All you really need is a UserSubscriptionOptions table (userId int, subscriptionId int, optionId int, primary key (userId, subscriptionId, optionId)
This is simpler than it seems:
Create a view over UserSubscriptions joined to SubscriptionOptions
Order by user, than as desired e.g. from "earliest" to "latest"
Add a hash across user id and option id
For each duplicate hash delete the corresponding SubscriptionOptions row
Once done, delete any Subscriptions rows with no corresponding SubscriptionOptions rows
This is doable in SQL or the imperative language of your choice.

Write SQL to identify multiple subgroupings within a grouping

I have a program that summarizes non-normalized data in one table and moves it to another and we frequently get a duplicate key violation on the insert due to bad data. I want to create a report for the users to help them identify the cause of the error.
For example, consider the following contrived simple SQL which summarizes data in the table Companies and inserts it into CompanySum, which has a primary key of State/Zone. In order for the INSERT not to fail, there cannot be more than one distinct combinations of Company/Code for every unique primary key State/Zone combination. If there is, we want the insert to fail so that the data can be corrected.
INSERT INTO CompanySum
(
[State]
,[Zone]
,[Company]
,[Code]
,[Revenue]
)
SELECT
--Keys of target
[State]
,[Zone]
--We are expecting to have one distinct combination of these fields per key grouping
,[Company]
,[Code]
--Aggregate
,SUM([Revenue])
FROM COMPANIES
GROUP BY
[State]
,[Zone]
,[Company]
,[Code]
I would like to create a report to help the users easily identify and correct the data so that there is only one distinct Company/Code combination within a State/Zone. For each distinct State/Zone value, I would like to identify the distinct Company/Code combinations within the State/Zone. If there are more than one Company/Code combinations within a State/Zone, I would like all of the records in the State/Zone to be displayed in the output. For example, here is the sample input and desired output:
Data:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
4 CT C Cigna 45678 24
5 CT C Cigna 45678 234
6 MI A GM 48089 100
7 MI A GM 54555 200
8 MI B Chrysler 43434 44
Desired Output:
RecordNumber State Zone Company Code Revenue
------------ ----- ---- ------- ---- --------
1 CT B State of CT 65453 10
2 CT B State of CT 65453 3
3 CT B Travelers 33443 20
6 MI A GM 48089 100
7 MI A GM 54555 200
Here is the DDL and DML needed to create this test scenario
CREATE TABLE [dbo].[Companies](
[RecordNumber] [int] NULL,
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[CompanySum](
[State] [char](2) NOT NULL,
[Zone] [varchar](30) NOT NULL,
[Company] [varchar](30) NOT NULL,
[Code] [varchar](30) NOT NULL,
[Revenue] [numeric](9, 1) NULL,
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
DELETE FROM [dbo].[Companies]
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (1, N'CT', N'B', N'State of CT', N'65453', CAST(10.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (2, N'CT', N'B', N'State of CT', N'65453', CAST(3.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (3, N'CT', N'B', N'Travelers', N'33443', CAST(20.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (4, N'CT', N'C', N'Cigna', N'45678', CAST(24.0 AS Numeric(9, 1)))
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (5, N'CT', N'C', N'Cigna', N'45678', CAST(234.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (6, N'MI', N'A', N'GM', N'48089', CAST(100.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (7, N'MI', N'A', N'GM', N'54555', CAST(200.0 AS Numeric(9, 1)))
GO
INSERT [dbo].[Companies] ([RecordNumber], [State], [Zone], [Company], [Code], [Revenue]) VALUES (8, N'MI', N'B', N'Chrysler', N'43434', CAST(44.0 AS Numeric(9, 1)))
GO
This is a hopefully better re-construction of a previous post of mine SQL to return unique combinations of non key columns within a set of key columns where I am trying to help clarify the question and provide a simple working example that readers can use.
Please see this SQL Fiddle:
http://sqlfiddle.com/#!18/d0141/1
Is this a solution?
Fiddle: http://sqlfiddle.com/#!18/12e9a0/9
select c.*
from
Companies c
inner join (
select State, Zone
from Companies
group by State, Zone
having count(distinct Company + Code) > 1
) as dup_state_zone
on(
c.State = dup_state_zone.State
and c.Zone = dup_state_zone.Zone
)
Edited - Fix the having clause, with a little cheat...
I used windows ranking function to rank the records by state ordering by zone ascending, to get the desired output.
Suggestion: I would like to say that the insert statement of your CompanySum will ail due to your primary key constraint as you select duplicate key records. in this case you need to change your primary key constraint a little.
CONSTRAINT [PK_CompanySum] PRIMARY KEY CLUSTERED
(
[State] ASC,
[Zone] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Since State and zone both are with duplicate values this insert will fail. better add a auto increment primary key, or include RecordNumber in to Primary key constraint rather than using State and Zone to make it usnique as there are duplicate values in your desired output.
SELECT
A.[RecordNumber]
,A.[State]
,A.[Zone]
,A.[Company]
,A.Code
,A.Revenue
FROM
(
SELECT *
,RANK() OVER (PARTITION BY [STATE] ORDER BY Zone) AS [row]
FROM Companies
) AS A
WHERE [row] =1
Highlighted are duplicates which will make your insert fail.

T-SQL Select row only if not exist already

I have a table with two Ids, ResourceId and LanguageId
I need to join those two selects where second result would be added only if ResourceId not already in the list.
SELECT * FROM Resources WHERE Language = 1
SELECT * FROM Resources WHERE Language = 0
JOIN
/*where ResourceId not present already*/
So far I came up with nothing except complicated partitions. Is there better solution to this?
Not all ResourceIds have Language 0 entry
Not all ResourceIds have Language 1 entry
Some ResourceIds have both
CREATE TABLE [dbo].[Resources](
[Id] [bigint] NOT NULL,
[ResourceId] [bigint] NOT NULL,
[LanguageId] [int] NOT NULL,
[Text] [nvarchar](2000) NULL,
[Path] [varchar](2000) NULL,
CONSTRAINT [PK_Resourcces] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
You could use a union with exists logic:
SELECT * FROM Resources WHERE Language = 1
UNION ALL
SELECT *
FROM Resources r1
WHERE
Language = 0 AND
NOT EXISTS (SELECT 1 FROM Resources r2
WHERE r2.Language = 1 AND r2.ResourceId = r1.ResourceId);
You can number the rows per resourceid by languageid using the row_number() window function and then just select the "first" one.
SELECT id,
resourceid,
languageid,
text,
path
FROM (SELECT id,
resourceid,
languageid,
text,
path,
row_number() OVER (PARTITION BY resourceid
ORDER BY languageid DESC) rn
FROM resources
WHERE languageid IN (0,
1) x
WHERE rn = 1;
Since I had started answering but Tim was more effective than me, I still put my answer as you indicated that, and I quote:
If somebody finds something faster and simpler, I would love to see it
USE TEST
CREATE DATABSE TEST
CREATE TABLE Ressources
(
RessourceId INT,
LanguageId INT
);
INSERT INTO Ressources
VALUES
(1,1),
(1,0),
(1,2),
(1,3),
(2,1),
(2,0),
(2,2),
(3,1),
(4,1),
(5,0)
WITH CTE_L1 AS (SELECT * FROM Ressources WHERE LanguageId = 1)
SELECT * FROM CTE_L1
UNION ALL
SELECT * FROM Ressources
WHERE LanguageId = 0
AND RessourceId NOT IN(SELECT RessourceId FROM CTE_L1)
Results I got:
RessourceId LanguageId
----------- -----------
1 1
2 1
3 1
4 1
5 0
(Same result if I execute the #Tim Biegeleisen query)
See which one you like best.
--> Cost of mine query 0.010132
--> Cost of Tim query 0.0100952
(Based on the execution plan)

SQL Recursive Menu Sorting

I've got a simple table that I'm using to represent a hierarchy of categories.
CREATE TABLE [dbo].[Categories](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Title] [varchar](256) NOT NULL,
[ParentID] [int] NOT NULL,
CONSTRAINT [PK_Categories] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('All', 0)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('Banking', 8)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('USAA Checking', 2)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('USAA Mastercard', 2)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('Medical', 8)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('Jobs', 8)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('Archive', 1)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('Active', 1)
INSERT INTO [MDS].[dbo].[Categories]([Title],[ParentID]) VALUES ('BoA Amex', 2)
Everything is fine except for selecting the entire tree. Here is my query, I removed my ORDER BY because it doesn't work:
WITH CategoryTree (ID, Title, Level, ParentID) AS
(
SELECT r.ID, r.Title, 0 Level, r.ParentID
FROM Categories r
WHERE r.ParentID = 0
UNION ALL
SELECT c.ID, c.Title, p.Level + 1 AS Level, c.ParentID
FROM Categories c
INNER JOIN CategoryTree p
ON p.ID = c.ParentID
)
SELECT ID,
REPLICATE('-----', Level) + Title AS Title,
ParentID
FROM CategoryTree
Results:
ID Title ParentID
1 All 0
7 -----Archive 1
8 -----Active 1
2 ----------Banking 8
5 ----------Medical 8
6 ----------Jobs 8
3 ---------------USAA Checking 2
4 ---------------USAA Mastercard 2
9 ---------------BoA Amex 2
The result I want is this:
ID Title ParentID
1 All 0
8 -----Active 1
2 ----------Banking 8
9 ---------------BoA Amex 2
3 ---------------USAA Checking 2
4 ---------------USAA Mastercard 2
6 ----------Jobs 8
5 ----------Medical 8
7 -----Archive 1
What is killing me is I got this working perfectly before but then I forgot to back up the DB and lost it in a server upgrade.
I looked at the HierarchyID type in 2008 but it just seems like a big pain in the ass if you care about order of children at the same level.
Ok, got it :) -- This seems to work here.
DECLARE #Categories TABLE (
ID int PRIMARY KEY
,Title varchar(256)
,ParentID int
)
INSERT INTO #Categories
VALUES
(1, 'All', 0)
,(2,'Banking', 8)
,(3,'USAA Checking', 2)
,(4,'USAA Mastercard', 2)
,(5,'Medical', 8)
,(6,'Jobs', 8)
,(7,'Archive', 1)
,(8,'Active', 1)
,(9,'BoA Amex', 2)
;
WITH CategoryTree
AS (SELECT r.ID, r.Title, 0 Level, r.ParentID,
CAST(r.Title AS VARCHAR(1000)) AS "Path"
FROM #Categories r
WHERE r.ParentID = 0
UNION ALL
SELECT c.ID, c.Title, p.Level + 1 AS Level, c.ParentID,
CAST((p.path + '/' + c.Title) AS VARCHAR(1000)) AS "Path"
FROM #Categories c
INNER JOIN CategoryTree p
ON p.ID = c.ParentID
)
SELECT ID, REPLICATE('-----', Level) + Title AS Title, [Path]
FROM CategoryTree
ORDER BY [Path]