Compare two tables and find differences in column values

Compare two tables and find differences in column values - sql

I want to compare two tables (identical columns) and find out that what column value changed.
Here is an example with sample data.
employee_original table has 6 columns.
CREATE TABLE [dbo].[employee_original](
[emp_id] [int] IDENTITY(1,1) NOT NULL,
[first_name] [varchar](100) NOT NULL,
[last_name] [varchar](100) NOT NULL,
[salary] int NOT NULL,
[city] [varchar](20) NOT NULL,
[department] [varchar](20) NOT NULL,
PRIMARY KEY CLUSTERED
(
[emp_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO employee_original VALUES ( 'Julia', 'Schultz', 100, 'New York', 'Tech');
INSERT INTO employee_original VALUES ( 'Vincent', 'Trantow', 200, 'Moscow', 'HR');
INSERT INTO employee_original VALUES ( 'Whitney ', 'Pouros', 500, 'Miami', 'Accounting');
INSERT INTO employee_original VALUES ( 'Chandler', 'Osinski', 10, 'Singapore', 'Purchasing');
INSERT INTO employee_original VALUES ( 'Sydnie', 'Green', 700, 'Ireland', 'Operations');
INSERT INTO employee_original VALUES ( 'Josefa', 'Anderson', 800, 'Berlin', 'Purchase');
INSERT INTO employee_original VALUES ( 'Brayan', 'Bergstrom', 900, 'New York', 'Operations');
INSERT INTO employee_original VALUES ( 'Shyanne', 'Kris', 900, 'New York', 'Sales');
employee_modified has same employee but some of the attributes have changed for few employees.
CREATE TABLE [dbo].[employee_modified](
[emp_id] [int] IDENTITY(1,1) NOT NULL,
[first_name] [varchar](100) NOT NULL,
[last_name] [varchar](100) NOT NULL,
[salary] int NOT NULL,
[city] [varchar](20) NOT NULL,
[department] [varchar](20) NOT NULL,
PRIMARY KEY CLUSTERED
(
[emp_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO employee_modified VALUES ( 'Julia', 'Schultz', 100, 'New York', 'Tech');
INSERT INTO employee_modified VALUES ( 'Vincent', 'Wyman', 500, 'Moscow', 'HR');
INSERT INTO employee_modified VALUES ( 'Whitney ', 'Pouros', 500, 'Miami', 'Sales');
INSERT INTO employee_modified VALUES ( 'Chandler', 'Osinski', 10, 'Singapore', 'Purchasing');
INSERT INTO employee_modified VALUES ( 'Sydnie', ' Cartwright', 900, 'Ireland', 'Operations');
INSERT INTO employee_modified VALUES ( 'Joseph', 'Anderson', 800, 'Berlin', 'Purchase');
INSERT INTO employee_modified VALUES ( 'Bryan', 'Bergstrom', 900, 'Naples', 'Operations');
INSERT INTO employee_modified VALUES ( 'Shyanne', 'Jakubowski', 900, 'New York', 'Accounting');
I am looking for a result that can tell me what field changed for which employee.
e.g. emp_id =2 has last name and salary change. So output should look like:
emp_id attribute orignial_value new_value
2 last_name Trantow Wyman
2 salary 200 500
This is what I have tried so far:
(1) Join tables and find what changed :
DROP TABLE IF EXISTS #temp;
SELECT distinct
o.emp_id,
o.first_name [original_first_name], m.first_name [modified_first_name],
o.last_name [original_last_name], m.last_name [modified_last_name],
o.salary [original_salary], m.salary [modified_salary],
o.city [original_city], m.city [modified_city],
o.department [original_department], m.department [modified_department]
into #temp from
[dbo].[employee_original] o inner join [dbo].[employee_modified] m on o.emp_id = m.emp_id
select * from #temp
Gives me
(2) Self join with #temp and find out what attribute has changed.
-- All Last Name Changes.
select distinct t1.emp_id, t1.original_last_name, t2.modified_last_name
from #temp t1
inner join #temp t2 on t1.emp_id = t2.emp_id
where t1.original_last_name <> t2.modified_last_name
-- All Department changes
select distinct t1.emp_id, t1.original_department, t2.modified_department
from #temp t1
inner join #temp t2 on t1.emp_id = t2.emp_id
where t1.original_department <> t2.modified_department
Any pointers on how I can get to my desired result.

Here is an option that will dynamically unpivot your data without actually using dynamic SQL.
Example
Select emp_id
,[key]
,Org_Value = max( case when Src=1 then Value end)
,New_Value = max( case when Src=2 then Value end)
From (
Select Src=1
,emp_id
,B.*
From [employee_original] A
Cross Apply ( Select [Key]
,Value
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ) )
) B
Union All
Select Src=2
,emp_id
,B.*
From [employee_modified] A
Cross Apply ( Select [Key]
,Value
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ) )
) B
) A
Group By emp_id,[key]
Having max( case when Src=1 then Value end)
<> max( case when Src=2 then Value end)
Order By emp_id,[key]
Results

You can use the following code to unpivot all possible changes
SELECT
o.emp_id,
v.column_name,
v.old_value,
v.new_value
FROM employee_original o
JOIN employee_modified m ON o.emp_id = m.emp_id
CROSS APPLY (
SELECT 'first_name', CAST(o.first_name AS nvarchar(max)), CAST(m.first_name AS nvarchar(max))
WHERE o.first_name <> m.first_name
UNION ALL
SELECT 'last_name', o.last_name, m.last_name
WHERE o.last_name <> m.last_name
UNION ALL
SELECT 'salary', o.salary, m.salary
WHERE o.salary <> m.salary
UNION ALL
SELECT 'city', o.city, m.city
WHERE o.city <> m.city
UNION ALL
SELECT 'department', o.department, m.department
WHERE o.department <> m.department
) v(column_name, old_value, new_value);

Related

SQL Recursive Count

I have two tables I am joining with the following structure:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[ContentDivider](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ParentId] [int] NULL,
[Name] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_ContentDivider] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
/****** Object: Table [dbo].[CustomPage] Script Date: 23-03-2020 17:46:09 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CustomPage](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ContentDividerId] [int] NOT NULL,
[Name] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_CustomPage] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
A ContentDivider can have n ContentDividers as Children and can have m CustomPages as children as well.
I want a View that counts the Display the current CustomDivider and the COunt for all the CustomPages as Children of the current ContentDivider.
My Test data:
SET IDENTITY_INSERT [dbo].[ContentDivider] ON
GO
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (1, NULL, N'TopLevel1')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (2, NULL, N'TopLevel2')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (3, NULL, N'TopLevel3')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (4, 1, N'SecondLevel1')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (5, 1, N'SecondLevel2')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (6, 1, N'SecondLevel3')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (7, 4, N'ThirdLevel1')
INSERT [dbo].[ContentDivider] ([Id], [ParentId], [Name]) VALUES (8, 4, N'ThirdLevel2')
GO
SET IDENTITY_INSERT [dbo].[ContentDivider] OFF
GO
SET IDENTITY_INSERT [dbo].[CustomPage] ON
GO
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (1, 1, N'Level1_1')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (2, 1, N'Level1_2')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (3, 2, N'Level1_3')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (4, 2, N'Level1_4')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (5, 4, N'Level1_5')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (6, 5, N'Level1_6')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (7, 7, N'Level1_7')
INSERT [dbo].[CustomPage] ([Id], [ContentDividerId], [Name]) VALUES (8, 8, N'Level1_8')
GO
SET IDENTITY_INSERT [dbo].[CustomPage] OFF
GO
And the View I want to extend:
SELECT dbo.ContentDivider.ParentId, dbo.ContentDivider.Name, dbo.ContentDivider.Id, COUNT(DISTINCT dbo.CustomPage.Id) AS CustomPageCount
FROM dbo.ContentDivider LEFT OUTER JOIN
dbo.CustomPage ON dbo.ContentDivider.Id = dbo.CustomPage.ContentDividerId
GROUP BY dbo.ContentDivider.ParentId, dbo.ContentDivider.Name, dbo.ContentDivider.Id
As for now the view counts the custompages directly underneath the contentdivider. I would like all the CustomPages as children counted.
Any suggestions?
The respected result would be:
View

this sounds like a perfect situation for recursive cte ;)
So, if I understood correctly, your expected result would be Toplevel1 with 6 pages and Toplevel 2 with 2 pages since all the other levels are somewhere beneath these two mentioned levels?
The cte might look something like this (maybe you habe to include the max recursion option):
WITH cte AS(
SELECT 1 lvl, ID AS ParentID, ID, Name
FROM dbo.ContentDivider cd
WHERE ParentId IS NULL
UNION ALL
SELECT c.lvl+1 AS lvl, c.ParentID, cd.ID, cd.Name
FROM dbo.ContentDivider cd
INNER JOIN cte c ON cd.ParentID = c.ID
)
SELECT c.ParentID, cd.Name, COUNT(DISTINCT cp.Id) AS CustomPageCount
FROM cte c
JOIN dbo.ContentDivider cd ON cd.ID = c.ParentID
LEFT OUTER JOIN dbo.CustomPage cp ON cp.ContentDividerId = c.id
GROUP BY c.ParentId, cd.Name
This leads to all pages being assigned to its top level.
See fiddle for details: http://sqlfiddle.com/#!18/f1a44/28/1
Edit: Since you need the details down to DividerID, I extended my example in the fiddle. First of all, I fetch the PageCount per ID in one cte and additionally the PageCount aggregated on level (ParentID and all its IDs) - this done you don't need the count and grouping in the following ctes.
In my query I then check, if my current rows ID is a top-level of any kind and assign the corresponding PageCount to this row.
WITH cteCnt AS(
SELECT cd.ID, COUNT(DISTINCT cp.Id) AS CustomPageCount
FROM dbo.ContentDivider cd
LEFT OUTER JOIN dbo.CustomPage cp ON cp.ContentDividerId = cd.id
GROUP BY cd.ID
),
cteTop AS(
SELECT cd.ID, COUNT(DISTINCT cp.Id) AS CustomPageCount
FROM dbo.ContentDivider cd
LEFT OUTER JOIN dbo.CustomPage cp ON cp.ContentDividerId = cd.id
GROUP BY cd.ID
UNION ALL
SELECT cd.ParentID, COUNT(DISTINCT cp.Id) AS CustomPageCount
FROM dbo.ContentDivider cd
LEFT OUTER JOIN dbo.CustomPage cp ON cp.ContentDividerId = cd.id
WHERE cd.ParentID IS NOT NULL
GROUP BY cd.ParentID
),
cteTopSum AS(
SELECT ID, SUM(CustomPageCount) AS CustomPageCount
FROM ctetop
GROUP BY ID
),
cte AS(
SELECT 1 lvl, cd.ID AS ParentID, cd.ID AS ParentIDx, cd.ID, cd.Name, cnt.CustomPageCount
FROM dbo.ContentDivider cd
INNER JOIN cteCnt cnt ON cnt.ID = cd.ID
WHERE ParentId IS NULL
UNION ALL
SELECT c.lvl+1 AS lvl, c.ParentID, cd.ParentID AS ParentIDx, cd.ID, cd.Name, cnt.CustomPageCount
FROM dbo.ContentDivider cd
INNER JOIN cteCnt cnt ON cnt.ID = cd.ID
INNER JOIN cte c ON cd.ParentID = c.ID
),
cteOut AS(
SELECT *
,SUM(CustomPageCount) OVER (PARTITION BY ParentID) x
,SUM(CustomPageCount) OVER (PARTITION BY ParentIDx) y
FROM cte c
)
SELECT CASE WHEN co.ParentIDx = co.ID THEN NULL ELSE co.ParentIDx END AS ParentID, co.ID, co.Name, CASE WHEN co.ID = co.ParentID THEN co.X ELSE ts.CustomPageCount END CustomPageCount
FROM cteOut co
LEFT JOIN cteTopSum ts ON ts.ID = co.ID
ORDER BY 1, 2
See new fiddle for details: http://sqlfiddle.com/#!18/f1a44/185/1
I'm mot sure, if there is a prettier / nicer way to solve this, but seemingly this seems to solve the problem.
However, I did NOT check if it works if any number of sublevels or whatsoever - if you find any issues, feel free to comment.

SQL left join for inactive records

I have 2 tables. One of them has actual names and the other one has nicknames used by those people.
CREATE TABLE [dbo].[Customer](
[id] [int] IDENTITY(1,1) NOT NULL,
[firstName] [varchar](50) NULL,
[lastName] [varchar](50) NULL,
[active] [bit] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[CustomerAKA](
[id] [int] NULL,
[akaFirstName] [varchar](50) NULL,
[akaLastName] [varchar](50) NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Customer] ON
INSERT [dbo].[Customer] ([id], [firstName], [lastName], [active]) VALUES (1, N'Op', N'Test', 0)
INSERT [dbo].[Customer] ([id], [firstName], [lastName], [active]) VALUES (2, N'M', N'J', 1)
INSERT [dbo].[Customer] ([id], [firstName], [lastName], [active]) VALUES (3, N'John', N'Doe', 1)
SET IDENTITY_INSERT [dbo].[Customer] OFF
INSERT [dbo].[CustomerAKA] ([id], [akaFirstName], [akaLastName]) VALUES (1, N'Hello', N'Test')
INSERT [dbo].[CustomerAKA] ([id], [akaFirstName], [akaLastName]) VALUES (1, N'Mahalo', N'Test')
INSERT [dbo].[CustomerAKA] ([id], [akaFirstName], [akaLastName]) VALUES (3, N'Jonny', N'Doe')
My query is :
select *
from dbo.Customer c1
left join dbo.CustomerAKA c2 on c2.id = c1.id
where not exists ( select *
from dbo.Customer c
where c.id = c1.id
and c.active = 0 )
Even though Op Test is not active, I still want to get the nicknames for him:
1 Hello Test
1 Mahalo Test
So my output should be :
M J
John Doe
Jonny Doe
Hello Test
Mahalo Test
Any ideas?

I think you want union all:
select c.firstname, c.lastname
from customer c
union all
select ca.akafirstname, c.akalastname
from customeraka ca join
customer c
on ca.id = c.id
where c.active = 1;

Try this:
/*
WITH
Customer (id, firstName, lastName, active) AS
(
VALUES
(1, 'Op', 'Test', 0)
, (2, 'M', 'J', 1)
, (3, 'John', 'Doe', 1)
)
, CustomerAKA (id, akaFirstName, akaLastName) AS
(
VALUES
(1, 'Hello', 'Test')
, (1, 'Mahalo', 'Test')
, (3, 'Jonny', 'Doe')
)
*/
SELECT firstName, lastName
FROM Customer
WHERE active = 1
UNION ALL
SELECT c2.akaFirstName AS firstName, c2.akaLastName AS lastName
FROM CustomerAKA c2
JOIN Customer c1 ON c1.id=c2.id;

Not sure about what you want
But seems something like
select *
from dbo.Customer c1
left join dbo.CustomerAKA c2 on c2.id = c1.id
where c1.active = 1 or (c1.firstname = "Op" and c1.lastname = "Test")
Order by c1.active desc
Or like
select
coalesce(c2.firstname, c1.firstname) as firstname,
coalesce(c2.lastname, c1.lastname) as lastname
from dbo.Customer c1
Left outer join dbo.CustomerAKA c2 on c2.id = c1.id
Order by c1.active desc

SELECT data from multiple tables using max?

I have Project and Score tables like this
How do I query to get this result, to show all ProjectID, ProjectName, and its newest Score (latest Date) with the Date:
I tried:
SELECT R.ProjectID, Name, Score, Date
FROM PWINProject, PWINRecord R
WHERE Date = (
SELECT max(Date)
FROM PWINRecord
WHERE ProjectID = R.ProjectID
)
AND PWINProject.ProjectID = R.ProjectID
But it only shows me projects with a score, when a project doesn't have a score yet (e.g. #3 - Amazon) it won't show.

You need a left join instead on an inner join (and use a proper join not the non-standard old style join you are using). However it also simplifies the query to use a window function to determine what row to return e.g.
declare #PWINProject table (ProjectId int, [Name] varchar(256))
insert into #PWINProject(ProjectId, [Name])
select 1, 'Database'
union all select 2, 'Microsoft'
union all select 3, 'Amazon'
union all select 4, 'IBM'
declare #PWINRecord table (ScoreId int, ProjectId int, Score int, [Date] datetime)
insert into #PWINRecord (ScoreId, ProjectId, Score, [Date])
select 1, 1, 100, '2019-01-15 19:40:46.723'
union all select 2, 1, 52, '2019-01-15 20:40:46.723'
union all select 3, 2, 60, '2019-01-15 21:40:46.723'
union all select 4, 2, 55, '2019-01-15 22:40:46.723'
union all select 5, 2, 72, '2019-01-15 23:41:46.723'
union all select 6, 4, 111, '2019-01-16 10:40:46.723'
union all select 7, 4, 90, '2019-01-17 12:40:46.723'
select ProjectId, [Name], Score, [Date]
from (
SELECT P.ProjectID, [Name], Score, [Date]
, row_number() over (partition by R.ProjectID order by [Date] desc) Row#
FROM #PWINProject P
left join #PWINRecord R on R.ProjectID = P.ProjectID
) X
where X.Row# = 1
order by ProjectId
Returns:
ProjectID Name Score Date
1 Database 52 2019-01-15 20:40:46.723
2 Microsoft 72 2019-01-15 23:41:46.723
3 Amazon NULL NULL
4 IBM 90 2019-01-17 12:40:46.723
PS: This is the recommended style for posting an SQL question, where you setup the data into temp tables or table variables yourself - saves people answering a lot of time. Data in images is a no-no.

One approach would be to use OUTER APPLY:
DECLARE #Project TABLE (ProjectId INT, ProjectName VARCHAR(100))
INSERT INTO #Project
VALUES (1, 'Database'), (2, 'Microsoft'), (3, 'Amazon'), (4, 'IBM')
DECLARE #Score TABLE(ScoreId INT, ProjectId INT, Score INT, RefDate DATE)
INSERT INTO #Score
VALUES (1, 1, 100, '2019-01-01'), (2, 2, 200, '2019-02-02'), (4, 4, 400, '2019-04-04')
SELECT
P.ProjectId,
P.ProjectName,
S.Score,
S.RefDate
FROM #Project P
OUTER APPLY (
SELECT TOP 1 S.*
FROM #Score S
WHERE S.ProjectId = P.ProjectId ORDER BY S.RefDate DESC
) S
You need to be careful outer apply is not very efficient but it's clean and easy to understand. Other techniques may work involving ROW_NUMBER, you should study a bit the execution plan to see what fits best.

Try this:
select P.ProjectID, P.ProjectName, R.Score, R.Date
from PWINProject P
left outer join PWINRecord R on R.ProjectID = P.ProjectID
and R.Date = (
select max(R2.Date)
from PWINRecord R2
where R2.ProjectID = R.ProjectID
)

A simple solution is the following:
select p.ProjectID, p.ProjectName, s.Score as LatestScore, s.[date]
from Project as p
left outer join Scores as s ON p.ProjectID = s.ProjectID
where (
s.[date] = (
select top (1) s2.[date]
from Scores as s2
where s2.ProjectID = s.ProjectID
order by [date] desc
)
or s.[date] is null
)

CREATE TABLE [dbo].[PWINProject](
[ProjectID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](50) NULL,
CONSTRAINT [PK_PWINProject] PRIMARY KEY CLUSTERED
(
[ProjectID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
CREATE TABLE [dbo].[PWINRecord](
[ScoreId] [int] NOT NULL,
[ProjectID] [int] NULL,
[Score] [int] NULL,
[Date] [datetime] NULL,
CONSTRAINT [PK_PWINRecord] PRIMARY KEY CLUSTERED
(
[ScoreId] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
SET IDENTITY_INSERT [dbo].[PWINProject] ON
INSERT [dbo].[PWINProject] ([ProjectID], [Name]) VALUES (1, N'Database')
INSERT [dbo].[PWINProject] ([ProjectID], [Name]) VALUES (2, N'Microsoft')
INSERT [dbo].[PWINProject] ([ProjectID], [Name]) VALUES (3, N'Amazone')
INSERT [dbo].[PWINProject] ([ProjectID], [Name]) VALUES (4, N'IBM')
SET IDENTITY_INSERT [dbo].[PWINProject] OFF
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (1, 1, 100, CAST(N'2019-01-15 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (2, 1, 52, CAST(N'2019-01-15 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (3, 2, 60, CAST(N'2019-01-15 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (4, 2, 55, CAST(N'2019-01-15 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (5, 2, 72, CAST(N'2019-01-15 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (6, 4, 111, CAST(N'2019-01-16 00:00:00.000' AS DateTime))
INSERT [dbo].[PWINRecord] ([ScoreId], [ProjectID], [Score], [Date]) VALUES (7, 4, 90, CAST(N'2019-01-17 00:00:00.000' AS DateTime))
ALTER TABLE [dbo].[PWINRecord] WITH CHECK ADD CONSTRAINT [FK_PWINRecord_PWINProject] FOREIGN KEY([ProjectID])
REFERENCES [dbo].[PWINProject] ([ProjectID])
ALTER TABLE [dbo].[PWINRecord] CHECK CONSTRAINT [FK_PWINRecord_PWINProject]
-- To select your desired output run the below code
;WITH CTE
AS
(
SELECT p.ProjectID, p.Name, r.Score, r.[Date], ROW_NUMBER()OVER (PARTITION BY R.ProjectID ORDER BY r.[Date] DESC) RN
FROM PWINProject p
full join PWINRecord R on p.ProjectID =r.ProjectID
)
SELECT ProjectID, Name, Score, [Date]
FROM CTE
WHERE RN = 1

Display the latest row only

Goal:
If you retrieve any duplicate data that is data in the column secondid, then you to retrieve one row only from the latest date. For instance in the data below I have two different datetime, I would like to retrieve the data '2016-05-02 07:34:14.377' from value 6 in column secondid.
Problem:
I code seems not to be working and what am I missing.
Info:
There are many data in and you cannot hard code the value in in the code.
CREATE TABLE [dbo].[testing2](
[id] [int] NOT NULL,
[secondid] [int] NULL,
[value] [varchar](30) NULL,
[category] [int] NULL,
[test_id] [int] NULL,
[id_type] [int] NOT NULL,
[Testing2Datetime] [datetime] not NULL,
CONSTRAINT [PK_testing2] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
INSERT INTO [test].[dbo].[testing2]
VALUES (3, 3, 'a' ,2 ,11 ,1, '2016-05-01 07:34:14.377');
INSERT INTO [test].[dbo].[testing2]
VALUES (4, 4, 'a' ,2 ,11 ,1, '2016-05-01 07:34:14.377');
INSERT INTO [test].[dbo].[testing2]
VALUES (5, 5, 'a' ,2 ,11 ,0, '2016-05-01 07:34:14.377');
INSERT INTO [test].[dbo].[testing2]
VALUES (6, 6, 'a' ,2 ,11 ,2, '2016-05-01 07:34:14.377');
INSERT INTO [test].[dbo].[testing2]
VALUES (7, 6, 'a' ,2 ,11 ,2, '2016-05-02 07:34:14.377');
select
bb.secondid,
max(bb.Testing2Datetime)
from [dbo].[testing2] bb
group by
bb.secondid,
bb.Testing2Datetime

The maximum value of Testing2Datetime per Testing2Datetime is the Testing2Datetime itself. You should remove it from the group by clause and retrieve the maximum value per secondid only:
select
bb.secondid,
max(bb.Testing2Datetime)
from [dbo].[testing2] bb
group by
bb.secondid -- Here!

Remove bb.Testing2Datetime column from Group by
select
bb.secondid,
max(bb.Testing2Datetime) as [Max_Testing2Datetime]
from [dbo].[testing2] bb
group by
bb.secondid
or even this(Row_Number window function)
select *
from
(
select
bb.secondid,
bb.Testing2Datetime,
Row_number()over(partition by bb.secondid order by bb.Testing2Datetime desc) as RN
from [dbo].[testing2] bb
) A
Where RN = 1

SQL Server - Query to split time by count (overlapping offices)

I'm looking for some advice on the approach I should take with a query. I have a table (EMP) which stores employee details and working hours for this year (40 hours per week). A further 2 tables store the primary and secondary offices employees belong to. Since employees can move between offices, these are stored with dates.
I'm looking to return the number of working hours during the time the employee is in an office. If primary offices overlap with secondary offices for an employee, the hours should be split by the number of overlapping offices for the overlapping period only.
I attach sample DDL below.
-- Employee Table with hours for year 2014
CREATE TABLE [dbo].[EMP](
[EMP_ID] [int] NOT NULL,
[EMP_NAME] [varchar](255) NULL,
[EMP_FYHOURS] [float] NULL,
CONSTRAINT [PK_EMP] PRIMARY KEY CLUSTERED
(
[EMP_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
-- Employees and their primary offices
CREATE TABLE [dbo].[OFFICEPRIMARY](
[OFFICEPRIMARY_ID] [int] NOT NULL,
[OFFICEPRIMARY_NAME] [varchar](255) NULL,
[OFFICEPRIMARY_EMP_ID] [int] NOT NULL,
[OFFICEPRIMARY_START] [datetime] NULL,
[OFFICEPRIMARY_END] [datetime] NULL,
CONSTRAINT [PK_OFFICEPRIMARY] PRIMARY KEY CLUSTERED
(
[OFFICEPRIMARY_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OFFICEPRIMARY] WITH CHECK ADD CONSTRAINT [FK_OFFICEPRIMARY_FK1] FOREIGN KEY([OFFICEPRIMARY_EMP_ID])
REFERENCES [dbo].[EMP] ([EMP_ID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[OFFICEPRIMARY] CHECK CONSTRAINT [FK_OFFICEPRIMARY_FK1]
GO
-- Employees and their secondary offices
CREATE TABLE [dbo].[OFFICESECONDARY](
[OFFICESECONDARY_ID] [int] NOT NULL,
[OFFICESECONDARY_NAME] [varchar](255) NULL,
[OFFICESECONDARY_EMP_ID] [int] NOT NULL,
[OFFICESECONDARY_START] [datetime] NULL,
[OFFICESECONDARY_END] [datetime] NULL,
CONSTRAINT [PK_OFFICESECONDARY] PRIMARY KEY CLUSTERED
(
[OFFICESECONDARY_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OFFICESECONDARY] WITH CHECK ADD CONSTRAINT [FK_OFFICESECONDARY_FK1] FOREIGN KEY([OFFICESECONDARY_EMP_ID])
REFERENCES [dbo].[EMP] ([EMP_ID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[OFFICESECONDARY] CHECK CONSTRAINT [FK_OFFICESECONDARY_FK1]
GO
-- Insert sample data
INSERT INTO EMP (EMP_ID, EMP_NAME, EMP_FYHOURS)
VALUES (1, 'John Smith', 2080);
INSERT INTO EMP (EMP_ID, EMP_NAME, EMP_FYHOURS)
VALUES (2, 'Jane Doe', 2080);
GO
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (1, 'London', 1, '2014-01-01', '2014-05-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (2, 'Berlin', 1, '2014-06-01', '2014-08-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (3, 'New York', 1, '2014-09-01', '2014-12-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (4, 'New York', 2, '2014-01-01', '2014-04-15')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (5, 'Paris', 2, '2014-04-16', '2014-09-30')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (6, 'London', 2, '2014-10-01', '2014-12-31')
GO
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (1, 'Paris', 1, '2014-01-01', '2014-03-31')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (2, 'Lyon', 1, '2014-04-01', '2014-05-15')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (3, 'Berlin', 1, '2014-05-16', '2014-09-30')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (4, 'Chicago', 1, '2014-10-01', '2015-02-22')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (5, 'Chicago', 2, '2013-11-21', '2014-04-10')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (6, 'Berlin', 2, '2014-04-11', '2014-09-16')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (7, 'Amsterdam', 2, '2014-09-17', '2015-03-31')
GO
Thanks for the pointer. I adjusted your query so it presents a union of the primary and secondary office.
All that remains is working out the hours for overlapping periods between offices. For example,
John Smith, New York, 01/04/2014, 10/08/2014
John Smith, London, 01/08/2014, 31/12/2014
For the overlapping period between the offices which is 01/08/2014 to 10/08/2014, I would expect the hours to be split equally. If there were 3 overlapping offices, then it would be split 3-ways.
select 'Primary' as Office, e.EMP_NAME, op.OFFICEPRIMARY_NAME, op.OFFICEPRIMARY_START, op.OFFICEPRIMARY_END, datediff(wk,OFFICEPRIMARY_START,OFFICEPRIMARY_END) * 40 as HoursWorkedPrimary
from EMP e
inner join OFFICEPRIMARY op on op.OFFICEPRIMARY_EMP_ID = e.EMP_ID
union all
select 'Secondary' as Office, e.EMP_NAME, os.OFFICESECONDARY_NAME, os.OFFICESECONDARY_START, os.OFFICESECONDARY_END, datediff(wk,OFFICESECONDARY_START,OFFICESECONDARY_END) * 40 as HoursWorkedSecondary
from EMP e
inner join OFFICESECONDARY os on os.OFFICESECONDARY_EMP_ID = e.EMP_ID
order by e.EMP_NAME

If I understand correctly, the end result you want to see is the number of total hours worked per employee and office?
I've come up with this:
-- generate date table
declare #MinDate datetime, #MaxDate datetime
SET #MinDate = (SELECT MIN(d) FROM (SELECT d = OFFICEPRIMARY_START FROM dbo.OFFICEPRIMARY UNION SELECT OFFICESECONDARY_START FROM dbo.OFFICESECONDARY) a)
SET #MaxDate = (SELECT MAX(d) FROM (SELECT d = OFFICEPRIMARY_END FROM dbo.OFFICEPRIMARY UNION SELECT OFFICESECONDARY_END FROM dbo.OFFICESECONDARY) a)
SELECT
d = DATEADD(day, number, #MinDate)
INTO
#tmp_dates
FROM
(SELECT DISTINCT number FROM master.dbo.spt_values WHERE name IS NULL) n
WHERE
DATEADD(day, number, #MinDate) < #MaxDate
;WITH CTE AS
(
SELECT
d.d
,o.OfficeType
,o.OfficeID
,o.OfficeName
,o.EmpID
,EmpName = e.EMP_NAME
,HoursWorked = 8 / (COUNT(1) OVER (PARTITION BY EmpID, d))
FROM
(
SELECT
OfficeType = 1
,OfficeID = op.OFFICEPRIMARY_ID
,OfficeName = op.OFFICEPRIMARY_NAME
,EmpID = op.OFFICEPRIMARY_EMP_ID
,StartDate = op.OFFICEPRIMARY_START
,EndDate = op.OFFICEPRIMARY_END
FROM
dbo.OFFICEPRIMARY op
UNION
SELECT
OfficeType = 2
,OfficeID = os.OFFICESECONDARY_ID
,OfficeName = os.OFFICESECONDARY_NAME
,EmpID = os.OFFICESECONDARY_EMP_ID
,StartDate = os.OFFICESECONDARY_START
,EndDate = os.OFFICESECONDARY_END
FROM
dbo.OFFICESECONDARY os
) o
INNER JOIN
dbo.EMP e ON e.EMP_ID = o.EmpID
INNER JOIN
#tmp_dates d ON o.StartDate<=d.d AND o.EndDate>=d.d
)
SELECT
EmpID
,EmpName
,OfficeType
,OfficeName
,TotalHoursWorked = SUM(HoursWorked)
FROM
CTE
GROUP BY
EmpID
,EmpName
,OfficeType
,OfficeID
,OfficeName
ORDER BY
EmpID
,OfficeName
I first generate a temp table with the dates between minimum date and maximum date.
Then I union both office tables (why you have 2 tables anyway?) and I get a CTE that returns data on employee, date, office and number of hours worked in this office (8 divided by count of offices where employee has worked in on this day).
Then I sum this data to get sum of hours grouped by employee and office.
Maybe there is a simpler solution to this. This was the first solution that came to my mind.

This should give you a head start:
select datediff(wk,OFFICEPRIMARY_START,OFFICEPRIMARY_END) * 40 as HoursWorkedPrimary
,datediff(wk,OFFICESECONDARY_START,OFFICESECONDARY_END) * 40 as HoursWorkedSecondary
,EMP_NAME
,OFFICEPRIMARY_NAME,OFFICEPRIMARY_START,OFFICEPRIMARY_END
,OFFICESECONDARY_NAME,OFFICESECONDARY_START,OFFICESECONDARY_END
from [EMP]
inner join OFFICEPRIMARY as op on op.OFFICEPRIMARY_EMP_ID = EMP.EMP_ID
inner join OFFICESECONDARY as os on os.OFFICESECONDARY_EMP_ID = EMP.EMP_ID

The link below should help point you in the right direction to identifying how the dates overlap.
Count days in date range with set of exclusions which may overlap

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Compare two tables and find differences in column values - sql

Related

SQL Recursive Count

SQL left join for inactive records

SELECT data from multiple tables using max?

Display the latest row only

SQL Server - Query to split time by count (overlapping offices)

Categories

Resources