Related
I want to compare two tables (identical columns) and find out that what column value changed.
Here is an example with sample data.
employee_original table has 6 columns.
CREATE TABLE [dbo].[employee_original](
[emp_id] [int] IDENTITY(1,1) NOT NULL,
[first_name] [varchar](100) NOT NULL,
[last_name] [varchar](100) NOT NULL,
[salary] int NOT NULL,
[city] [varchar](20) NOT NULL,
[department] [varchar](20) NOT NULL,
PRIMARY KEY CLUSTERED
(
[emp_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO employee_original VALUES ( 'Julia', 'Schultz', 100, 'New York', 'Tech');
INSERT INTO employee_original VALUES ( 'Vincent', 'Trantow', 200, 'Moscow', 'HR');
INSERT INTO employee_original VALUES ( 'Whitney ', 'Pouros', 500, 'Miami', 'Accounting');
INSERT INTO employee_original VALUES ( 'Chandler', 'Osinski', 10, 'Singapore', 'Purchasing');
INSERT INTO employee_original VALUES ( 'Sydnie', 'Green', 700, 'Ireland', 'Operations');
INSERT INTO employee_original VALUES ( 'Josefa', 'Anderson', 800, 'Berlin', 'Purchase');
INSERT INTO employee_original VALUES ( 'Brayan', 'Bergstrom', 900, 'New York', 'Operations');
INSERT INTO employee_original VALUES ( 'Shyanne', 'Kris', 900, 'New York', 'Sales');
employee_modified has same employee but some of the attributes have changed for few employees.
CREATE TABLE [dbo].[employee_modified](
[emp_id] [int] IDENTITY(1,1) NOT NULL,
[first_name] [varchar](100) NOT NULL,
[last_name] [varchar](100) NOT NULL,
[salary] int NOT NULL,
[city] [varchar](20) NOT NULL,
[department] [varchar](20) NOT NULL,
PRIMARY KEY CLUSTERED
(
[emp_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
GO
INSERT INTO employee_modified VALUES ( 'Julia', 'Schultz', 100, 'New York', 'Tech');
INSERT INTO employee_modified VALUES ( 'Vincent', 'Wyman', 500, 'Moscow', 'HR');
INSERT INTO employee_modified VALUES ( 'Whitney ', 'Pouros', 500, 'Miami', 'Sales');
INSERT INTO employee_modified VALUES ( 'Chandler', 'Osinski', 10, 'Singapore', 'Purchasing');
INSERT INTO employee_modified VALUES ( 'Sydnie', ' Cartwright', 900, 'Ireland', 'Operations');
INSERT INTO employee_modified VALUES ( 'Joseph', 'Anderson', 800, 'Berlin', 'Purchase');
INSERT INTO employee_modified VALUES ( 'Bryan', 'Bergstrom', 900, 'Naples', 'Operations');
INSERT INTO employee_modified VALUES ( 'Shyanne', 'Jakubowski', 900, 'New York', 'Accounting');
I am looking for a result that can tell me what field changed for which employee.
e.g. emp_id =2 has last name and salary change. So output should look like:
emp_id attribute orignial_value new_value
2 last_name Trantow Wyman
2 salary 200 500
This is what I have tried so far:
(1) Join tables and find what changed :
DROP TABLE IF EXISTS #temp;
SELECT distinct
o.emp_id,
o.first_name [original_first_name], m.first_name [modified_first_name],
o.last_name [original_last_name], m.last_name [modified_last_name],
o.salary [original_salary], m.salary [modified_salary],
o.city [original_city], m.city [modified_city],
o.department [original_department], m.department [modified_department]
into #temp from
[dbo].[employee_original] o inner join [dbo].[employee_modified] m on o.emp_id = m.emp_id
select * from #temp
Gives me
(2) Self join with #temp and find out what attribute has changed.
-- All Last Name Changes.
select distinct t1.emp_id, t1.original_last_name, t2.modified_last_name
from #temp t1
inner join #temp t2 on t1.emp_id = t2.emp_id
where t1.original_last_name <> t2.modified_last_name
-- All Department changes
select distinct t1.emp_id, t1.original_department, t2.modified_department
from #temp t1
inner join #temp t2 on t1.emp_id = t2.emp_id
where t1.original_department <> t2.modified_department
Any pointers on how I can get to my desired result.
Here is an option that will dynamically unpivot your data without actually using dynamic SQL.
Example
Select emp_id
,[key]
,Org_Value = max( case when Src=1 then Value end)
,New_Value = max( case when Src=2 then Value end)
From (
Select Src=1
,emp_id
,B.*
From [employee_original] A
Cross Apply ( Select [Key]
,Value
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ) )
) B
Union All
Select Src=2
,emp_id
,B.*
From [employee_modified] A
Cross Apply ( Select [Key]
,Value
From OpenJson( (Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ) )
) B
) A
Group By emp_id,[key]
Having max( case when Src=1 then Value end)
<> max( case when Src=2 then Value end)
Order By emp_id,[key]
Results
You can use the following code to unpivot all possible changes
SELECT
o.emp_id,
v.column_name,
v.old_value,
v.new_value
FROM employee_original o
JOIN employee_modified m ON o.emp_id = m.emp_id
CROSS APPLY (
SELECT 'first_name', CAST(o.first_name AS nvarchar(max)), CAST(m.first_name AS nvarchar(max))
WHERE o.first_name <> m.first_name
UNION ALL
SELECT 'last_name', o.last_name, m.last_name
WHERE o.last_name <> m.last_name
UNION ALL
SELECT 'salary', o.salary, m.salary
WHERE o.salary <> m.salary
UNION ALL
SELECT 'city', o.city, m.city
WHERE o.city <> m.city
UNION ALL
SELECT 'department', o.department, m.department
WHERE o.department <> m.department
) v(column_name, old_value, new_value);
I am trying to get the second value based on date. Suppose, a user has three entries with date and the second date should be retrieved with the value as well. So my sample input is something like this:
UserId Date Amount
1001 2019-10-10 00:00:00.000 10000
1001 2018-01-01 00:00:00.000 20000
1001 2017-10-02 00:00:00.000 6000
1002 2017-10-10 00:00:00.000 1000
1002 2016-08-02 00:00:00.000 600
1003 2015-06-10 00:00:00.000 200
Expected output:
UserId Date Amount
1001 2018-01-01 00:00:00.000 20000
1002 2016-08-02 00:00:00.000 600
1003 2015-06-10 00:00:00.000 200
I hope, the above samples are informative enough to understand and tried the followings to make it work:
SELECT DISTINCT m.UserId, m.Amount FROM UserAmount m WHERE m.DatePosted =
(SELECT MAX(k.DatePosted) FROM UserAmount k WHERE
k.DatePosted < (SELECT MAX(p.DatePosted) FROM UserAmount p));
SELECT DISTINCT m.UserId, m.Amount FROM UserAmount m WHERE m.UserId IN (SELECT q.UserId FROM DetailsUser q) AND m.DatePosted =
(SELECT MAX(k.DatePosted) FROM UserAmount k WHERE k.UserId IN (SELECT r.UserId FROM DetailsUser r) AND
k.DatePosted < (SELECT MAX(p.DatePosted) FROM UserAmount p WHERE p.UserId IN (SELECT s.UserId FROM DetailsUser s)));
Unfortunately, I am getting result for the first id say 1001 from table as follows:
UserId Amount
1001 20000
Anything skipped or doing wrong in the query? Would expect some valuable suggestions to make it work.
Script:
USE [DbName]
GO
/****** Object: Table [dbo].[UserAmount] Script Date: 04/16/2019 23:42:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[UserAmount](
[Id] [int] IDENTITY(1,1) NOT NULL,
[UserId] [nvarchar](20) NULL,
[DatePosted] [datetime] NULL,
[Amount] [float] NULL,
CONSTRAINT [PK_UserAmount] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[UserAmount] ON
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (1, N'1001', CAST(0x0000AAE200000000 AS DateTime), 10000)
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (2, N'1001', CAST(0x0000A85B00000000 AS DateTime), 20000)
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (3, N'1001', CAST(0x0000A80000000000 AS DateTime), 6000)
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (4, N'1002', CAST(0x0000A80800000000 AS DateTime), 1000)
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (5, N'1002', CAST(0x0000A65600000000 AS DateTime), 600)
INSERT [dbo].[UserAmount] ([Id], [UserId], [DatePosted], [Amount]) VALUES (6, N'1003', CAST(0x0000A4B300000000 AS DateTime), 200)
SET IDENTITY_INSERT [dbo].[UserAmount] OFF
/****** Object: Table [dbo].[DetailsUser] Script Date: 04/16/2019 23:42:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[DetailsUser](
[Id] [int] IDENTITY(1,1) NOT NULL,
[UserId] [nvarchar](20) NULL,
CONSTRAINT [PK_DetailsUser] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[DetailsUser] ON
INSERT [dbo].[DetailsUser] ([Id], [UserId]) VALUES (1, N'1001')
INSERT [dbo].[DetailsUser] ([Id], [UserId]) VALUES (2, N'1002')
INSERT [dbo].[DetailsUser] ([Id], [UserId]) VALUES (3, N'1003')
SET IDENTITY_INSERT [dbo].[DetailsUser] OFF
N.B: Sample query could be done using any of the query languages - MS SQL or Oracle.
A simple way would be to use window functions and pick the second record.
Given your above setup:
SELECT s1.UserID, s1.Amount, s1.DatePosted
FROM (
SELECT du.UserID, ua.Amount, ua.DatePosted
, ROW_NUMBER() OVER ( PARTITION BY ua.UserID ORDER BY ua.DatePosted DESC ) AS rn
, COUNT(*) OVER ( PARTITION BY ua.UserID) AS theCount
FROM DetailsUser du
LEFT OUTER JOIN UserAmount ua ON du.userID = ua.UserID
) s1
WHERE s1.rn = 2 OR s1.theCount <=1
https://dbfiddle.uk/?rdbms=sqlserver_2012&fiddle=7035366e57188a3508e7348f0fe0ce8b
That will work on SQL Server and Oracle, but unfortunately not on MySQL 5.x (since it didn't introduce window functions until 8). PostgreS has had window functions for a while. I'm not sure which other flavors of SQL have them, but the same functionality can be duplicate in standard SQL.
You can do this using apply as well:
select du.*
from DetailsUser du outer apply
(select du2.date
from DetailsUser du2
where du2.userid = du.userid
offset 1 fetch first 1 row only
)
where du2.date is null or du2.date = du.date;
I am trying to create employee unit-wise service length report from their joining date and have used the following query to do so:
SELECT o.UnitName, p.DeptName, COUNT(m.EmpId) AS cnt,
(SELECT COUNT(m.EmpId) FROM EmpInf m
WHERE m.Desg IN ('Jr. Operator', 'Operator') AND m.Active = 'Active' AND m.DeptId = 2
AND DATEDIFF(MONTH, m.Joindate, GETDATE()) BETWEEN 0 AND 6) AS '0 - 6 Months'
FROM EmpInf m
INNER JOIN Department k ON k.DeptId = m.DeptId
INNER JOIN Section l ON l.secId = m.SecID
INNER JOIN UnitInf o ON o.UnitID = l.UnitName
INNER JOIN Department p ON p.DeptId = m.DeptId
WHERE Desg IN ('Jr. Operator', 'Operator') AND Active = 'Active' AND p.DeptName = 'Production'
GROUP BY o.UnitName, p.DeptName
Expected output as below: (As unit 1 and 4 have entry in the year 2017 means during 0 - 6 months of the year 2017 and there will be many others like 7 - 12, 13 - 24 etc)
Currently getting this:
I guess, having issue with the query and would be glad to know if there are any changes or alternates to do so.
Below is the script:
USE [sample]
GO
/****** Object: Table [dbo].[UnitInf] Script Date: 05/11/2017 21:19:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[UnitInf](
[UnitID] [int] IDENTITY(1,1) NOT NULL,
[UnitName] [nvarchar](100) NULL,
CONSTRAINT [PK_UnitInf] PRIMARY KEY CLUSTERED
(
[UnitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[UnitInf] ON
INSERT [dbo].[UnitInf] ([UnitID], [UnitName]) VALUES (1, N'Unit-01')
INSERT [dbo].[UnitInf] ([UnitID], [UnitName]) VALUES (2, N'Unit-02')
INSERT [dbo].[UnitInf] ([UnitID], [UnitName]) VALUES (3, N'Unit-03')
INSERT [dbo].[UnitInf] ([UnitID], [UnitName]) VALUES (4, N'Unit-04')
INSERT [dbo].[UnitInf] ([UnitID], [UnitName]) VALUES (5, N'Unit-05')
SET IDENTITY_INSERT [dbo].[UnitInf] OFF
/****** Object: Table [dbo].[Section] Script Date: 05/11/2017 21:19:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Section](
[secId] [int] IDENTITY(1,1) NOT NULL,
[SecName] [nvarchar](100) NULL,
[UnitName] [int] NULL,
CONSTRAINT [PK_Section] PRIMARY KEY CLUSTERED
(
[secId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Section] ON
INSERT [dbo].[Section] ([secId], [SecName], [UnitName]) VALUES (1, N'B-001', 1)
INSERT [dbo].[Section] ([secId], [SecName], [UnitName]) VALUES (2, N'C-001', 2)
INSERT [dbo].[Section] ([secId], [SecName], [UnitName]) VALUES (3, N'B-002', 1)
INSERT [dbo].[Section] ([secId], [SecName], [UnitName]) VALUES (4, N'D-004', 4)
SET IDENTITY_INSERT [dbo].[Section] OFF
/****** Object: Table [dbo].[EmpInf] Script Date: 05/11/2017 21:19:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[EmpInf](
[EmpId] [int] IDENTITY(1,1) NOT NULL,
[DeptId] [int] NULL,
[SecID] [int] NULL,
[EmpName] [nvarchar](100) NULL,
[GrossSal] [float] NULL,
[Desg] [nvarchar](100) NULL,
[SkillBonus] [float] NULL,
[Active] [nvarchar](10) NULL,
[JoinDate] [datetime] NULL,
CONSTRAINT [PK_EmpInf] PRIMARY KEY CLUSTERED
(
[EmpId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[EmpInf] ON
INSERT [dbo].[EmpInf] ([EmpId], [DeptId], [SecID], [EmpName], [GrossSal], [Desg], [SkillBonus], [Active], [JoinDate]) VALUES (1, 2, 2, N'John', 10000, N'Operator', 2000, N'Active', CAST(0x0000A59F00000000 AS DateTime))
INSERT [dbo].[EmpInf] ([EmpId], [DeptId], [SecID], [EmpName], [GrossSal], [Desg], [SkillBonus], [Active], [JoinDate]) VALUES (2, 2, 2, N'Jack', 12000, N'Operator', 5000, N'Active', CAST(0x0000A5BC00000000 AS DateTime))
INSERT [dbo].[EmpInf] ([EmpId], [DeptId], [SecID], [EmpName], [GrossSal], [Desg], [SkillBonus], [Active], [JoinDate]) VALUES (3, 2, 4, N'Nick', 14000, N'Jr. Operator', 6000, N'Active', CAST(0x0000A75100000000 AS DateTime))
INSERT [dbo].[EmpInf] ([EmpId], [DeptId], [SecID], [EmpName], [GrossSal], [Desg], [SkillBonus], [Active], [JoinDate]) VALUES (4, 2, 4, N'Bruce', 15000, N'Operator', 7000, N'Active', CAST(0x0000A79000000000 AS DateTime))
INSERT [dbo].[EmpInf] ([EmpId], [DeptId], [SecID], [EmpName], [GrossSal], [Desg], [SkillBonus], [Active], [JoinDate]) VALUES (5, 2, 1, N'Willy', 16000, N'Jr. Operator', 8000, N'Active', CAST(0x0000A7B800000000 AS DateTime))
SET IDENTITY_INSERT [dbo].[EmpInf] OFF
/****** Object: Table [dbo].[Department] Script Date: 05/11/2017 21:19:34 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Department](
[DeptId] [int] IDENTITY(1,1) NOT NULL,
[DeptName] [nvarchar](100) NULL,
CONSTRAINT [PK_Department] PRIMARY KEY CLUSTERED
(
[DeptId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Department] ON
INSERT [dbo].[Department] ([DeptId], [DeptName]) VALUES (1, N'Admin')
INSERT [dbo].[Department] ([DeptId], [DeptName]) VALUES (2, N'Production')
SET IDENTITY_INSERT [dbo].[Department] OFF
I think that simple conditional aggregation is the best approach:
SELECT o.UnitName, p.DeptName, COUNT(m.EmpId) AS cnt,
SUM(CASE WHEN DATEDIFF(MONTH, m.Joindate, GETDATE()) BETWEEN 0 AND 6
THEN 1 ELSE 0
END) AS [0 - 6 Months]
FROM EmpInf m INNER JOIN
Department k
ON k.DeptId = m.DeptId INNER JOIN
Section l
ON l.secId = m.SecID INNER JOIN
UnitInf o
ON o.UnitID = l.UnitName INNER JOIN
Department p
ON p.DeptId = m.DeptId
WHERE Desg IN ('Jr. Operator', 'Operator') AND Active = 'Active' AND
p.DeptName = 'Production'
GROUP BY o.UnitName, p.DeptName
I haven't run your code, but it looks like you're hardcoding too much in your subquery.
(SELECT COUNT(m.EmpId) FROM EmpInf m
WHERE m.Desg IN ('Jr. Operator', 'Operator') AND m.Active = 'Active' AND m.DeptId = 2
AND DATEDIFF(MONTH, m.Joindate, GETDATE()) BETWEEN 0 AND 6) AS '0 - 6 Months'
Problem 1: your alias for EmpInf in the subquery is "m" and your alias for EmpInf in your main query is "m". Make them different so you can link them.
Problem 2: Connect your variables in the subquery to values in the main query. So:
(SELECT COUNT(m.EmpId) FROM EmpInf subm
WHERE subm.Desg=m.Desg AND subm.Active = m.Active AND subm.DeptId = m.DeptId
AND DATEDIFF(MONTH, subm.Joindate, GETDATE()) BETWEEN 0 AND 6) AS '0 - 6 Months'
Goal:
If you have the input data that is -10 then you should not use the WHERE statement in function.
Problem:
I do not know how to solve it in this context. You have to use WHERE and not WHERE depending on what input data you retrieve
Info:
If you use -10 as a input data then you should retrieve all data based on [dbo].[testing] and it is okay to retrieve data that is null in [dbo].[testing2] in relation to LEFT JOIN.
*The code and its data is a sample from production phase.
Thank you!
CREATE TABLE [dbo].[testing](
[id] [int] NULL,
[value] [varchar](30) NULL,
[category] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[testing2](
[id] [int] NULL,
[value] [varchar](30) NULL,
[category] [int] NULL,
[test_id] [int] NULL,
[id_type] [int] NOT NULL
) ON [PRIMARY]
CREATE FUNCTION dbo.testt (
#data int
)
RETURNS TABLE
AS
RETURN
(
SELECT
a.[id],
a.[value],
a.[category],
b.[id_type]
FROM [dbo].[testing] a left join [dbo].[testing2] b on a.id = b.[id]
where b.[id_type] = #data
)
INSERT INTO [test].[dbo].[testing] VALUES
(1, '', 2), (2, '', 3), (3, 'a', 2), (4, 'a', 2),
(5, 'b', 2), (6, 'b', 2), (7, 'c', 2), (8, 'c', 2),
(9, 'c', 2), (10, 'c', 2);
INSERT INTO [test].[dbo].[testing2] VALUES
(3, 'a' ,2 ,11 ,1), (4, 'a' ,2 ,11 ,1),
(5, 'a' ,2 ,11 ,0), (6, 'a' ,2 ,11 ,2);
select
s.[id],
s.[value],
s.[category],
s.[id_type]
from dbo.testt(1) s
Have your WHERE clause check if #data is either -10 or matches b.[id_type].
WHERE (#data = -10) OR (b.[id_type] = #data)
What about where b.[id_type] = #data OR #data = -10 in the testt function ?
So your function would be:
CREATE FUNCTION dbo.testt (
#data int
)
RETURNS TABLE
AS
RETURN
(
SELECT
a.[id],
a.[value],
a.[category],
b.[id_type]
FROM [dbo].[testing] a
LEFT JOIN [dbo].[testing2] b on a.id = b.[id]
WHERE b.[id_type] = #data OR #data = -10
)
I'm looking for some advice on the approach I should take with a query. I have a table (EMP) which stores employee details and working hours for this year (40 hours per week). A further 2 tables store the primary and secondary offices employees belong to. Since employees can move between offices, these are stored with dates.
I'm looking to return the number of working hours during the time the employee is in an office. If primary offices overlap with secondary offices for an employee, the hours should be split by the number of overlapping offices for the overlapping period only.
I attach sample DDL below.
-- Employee Table with hours for year 2014
CREATE TABLE [dbo].[EMP](
[EMP_ID] [int] NOT NULL,
[EMP_NAME] [varchar](255) NULL,
[EMP_FYHOURS] [float] NULL,
CONSTRAINT [PK_EMP] PRIMARY KEY CLUSTERED
(
[EMP_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
-- Employees and their primary offices
CREATE TABLE [dbo].[OFFICEPRIMARY](
[OFFICEPRIMARY_ID] [int] NOT NULL,
[OFFICEPRIMARY_NAME] [varchar](255) NULL,
[OFFICEPRIMARY_EMP_ID] [int] NOT NULL,
[OFFICEPRIMARY_START] [datetime] NULL,
[OFFICEPRIMARY_END] [datetime] NULL,
CONSTRAINT [PK_OFFICEPRIMARY] PRIMARY KEY CLUSTERED
(
[OFFICEPRIMARY_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OFFICEPRIMARY] WITH CHECK ADD CONSTRAINT [FK_OFFICEPRIMARY_FK1] FOREIGN KEY([OFFICEPRIMARY_EMP_ID])
REFERENCES [dbo].[EMP] ([EMP_ID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[OFFICEPRIMARY] CHECK CONSTRAINT [FK_OFFICEPRIMARY_FK1]
GO
-- Employees and their secondary offices
CREATE TABLE [dbo].[OFFICESECONDARY](
[OFFICESECONDARY_ID] [int] NOT NULL,
[OFFICESECONDARY_NAME] [varchar](255) NULL,
[OFFICESECONDARY_EMP_ID] [int] NOT NULL,
[OFFICESECONDARY_START] [datetime] NULL,
[OFFICESECONDARY_END] [datetime] NULL,
CONSTRAINT [PK_OFFICESECONDARY] PRIMARY KEY CLUSTERED
(
[OFFICESECONDARY_ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[OFFICESECONDARY] WITH CHECK ADD CONSTRAINT [FK_OFFICESECONDARY_FK1] FOREIGN KEY([OFFICESECONDARY_EMP_ID])
REFERENCES [dbo].[EMP] ([EMP_ID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[OFFICESECONDARY] CHECK CONSTRAINT [FK_OFFICESECONDARY_FK1]
GO
-- Insert sample data
INSERT INTO EMP (EMP_ID, EMP_NAME, EMP_FYHOURS)
VALUES (1, 'John Smith', 2080);
INSERT INTO EMP (EMP_ID, EMP_NAME, EMP_FYHOURS)
VALUES (2, 'Jane Doe', 2080);
GO
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (1, 'London', 1, '2014-01-01', '2014-05-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (2, 'Berlin', 1, '2014-06-01', '2014-08-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (3, 'New York', 1, '2014-09-01', '2014-12-31')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (4, 'New York', 2, '2014-01-01', '2014-04-15')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (5, 'Paris', 2, '2014-04-16', '2014-09-30')
INSERT INTO OFFICEPRIMARY (OFFICEPRIMARY_ID, OFFICEPRIMARY_NAME, OFFICEPRIMARY_EMP_ID, OFFICEPRIMARY_START, OFFICEPRIMARY_END)
VALUES (6, 'London', 2, '2014-10-01', '2014-12-31')
GO
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (1, 'Paris', 1, '2014-01-01', '2014-03-31')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (2, 'Lyon', 1, '2014-04-01', '2014-05-15')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (3, 'Berlin', 1, '2014-05-16', '2014-09-30')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (4, 'Chicago', 1, '2014-10-01', '2015-02-22')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (5, 'Chicago', 2, '2013-11-21', '2014-04-10')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (6, 'Berlin', 2, '2014-04-11', '2014-09-16')
INSERT INTO OFFICESECONDARY (OFFICESECONDARY_ID, OFFICESECONDARY_NAME, OFFICESECONDARY_EMP_ID, OFFICESECONDARY_START, OFFICESECONDARY_END)
VALUES (7, 'Amsterdam', 2, '2014-09-17', '2015-03-31')
GO
Thanks for the pointer. I adjusted your query so it presents a union of the primary and secondary office.
All that remains is working out the hours for overlapping periods between offices. For example,
John Smith, New York, 01/04/2014, 10/08/2014
John Smith, London, 01/08/2014, 31/12/2014
For the overlapping period between the offices which is 01/08/2014 to 10/08/2014, I would expect the hours to be split equally. If there were 3 overlapping offices, then it would be split 3-ways.
select 'Primary' as Office, e.EMP_NAME, op.OFFICEPRIMARY_NAME, op.OFFICEPRIMARY_START, op.OFFICEPRIMARY_END, datediff(wk,OFFICEPRIMARY_START,OFFICEPRIMARY_END) * 40 as HoursWorkedPrimary
from EMP e
inner join OFFICEPRIMARY op on op.OFFICEPRIMARY_EMP_ID = e.EMP_ID
union all
select 'Secondary' as Office, e.EMP_NAME, os.OFFICESECONDARY_NAME, os.OFFICESECONDARY_START, os.OFFICESECONDARY_END, datediff(wk,OFFICESECONDARY_START,OFFICESECONDARY_END) * 40 as HoursWorkedSecondary
from EMP e
inner join OFFICESECONDARY os on os.OFFICESECONDARY_EMP_ID = e.EMP_ID
order by e.EMP_NAME
If I understand correctly, the end result you want to see is the number of total hours worked per employee and office?
I've come up with this:
-- generate date table
declare #MinDate datetime, #MaxDate datetime
SET #MinDate = (SELECT MIN(d) FROM (SELECT d = OFFICEPRIMARY_START FROM dbo.OFFICEPRIMARY UNION SELECT OFFICESECONDARY_START FROM dbo.OFFICESECONDARY) a)
SET #MaxDate = (SELECT MAX(d) FROM (SELECT d = OFFICEPRIMARY_END FROM dbo.OFFICEPRIMARY UNION SELECT OFFICESECONDARY_END FROM dbo.OFFICESECONDARY) a)
SELECT
d = DATEADD(day, number, #MinDate)
INTO
#tmp_dates
FROM
(SELECT DISTINCT number FROM master.dbo.spt_values WHERE name IS NULL) n
WHERE
DATEADD(day, number, #MinDate) < #MaxDate
;WITH CTE AS
(
SELECT
d.d
,o.OfficeType
,o.OfficeID
,o.OfficeName
,o.EmpID
,EmpName = e.EMP_NAME
,HoursWorked = 8 / (COUNT(1) OVER (PARTITION BY EmpID, d))
FROM
(
SELECT
OfficeType = 1
,OfficeID = op.OFFICEPRIMARY_ID
,OfficeName = op.OFFICEPRIMARY_NAME
,EmpID = op.OFFICEPRIMARY_EMP_ID
,StartDate = op.OFFICEPRIMARY_START
,EndDate = op.OFFICEPRIMARY_END
FROM
dbo.OFFICEPRIMARY op
UNION
SELECT
OfficeType = 2
,OfficeID = os.OFFICESECONDARY_ID
,OfficeName = os.OFFICESECONDARY_NAME
,EmpID = os.OFFICESECONDARY_EMP_ID
,StartDate = os.OFFICESECONDARY_START
,EndDate = os.OFFICESECONDARY_END
FROM
dbo.OFFICESECONDARY os
) o
INNER JOIN
dbo.EMP e ON e.EMP_ID = o.EmpID
INNER JOIN
#tmp_dates d ON o.StartDate<=d.d AND o.EndDate>=d.d
)
SELECT
EmpID
,EmpName
,OfficeType
,OfficeName
,TotalHoursWorked = SUM(HoursWorked)
FROM
CTE
GROUP BY
EmpID
,EmpName
,OfficeType
,OfficeID
,OfficeName
ORDER BY
EmpID
,OfficeName
I first generate a temp table with the dates between minimum date and maximum date.
Then I union both office tables (why you have 2 tables anyway?) and I get a CTE that returns data on employee, date, office and number of hours worked in this office (8 divided by count of offices where employee has worked in on this day).
Then I sum this data to get sum of hours grouped by employee and office.
Maybe there is a simpler solution to this. This was the first solution that came to my mind.
This should give you a head start:
select datediff(wk,OFFICEPRIMARY_START,OFFICEPRIMARY_END) * 40 as HoursWorkedPrimary
,datediff(wk,OFFICESECONDARY_START,OFFICESECONDARY_END) * 40 as HoursWorkedSecondary
,EMP_NAME
,OFFICEPRIMARY_NAME,OFFICEPRIMARY_START,OFFICEPRIMARY_END
,OFFICESECONDARY_NAME,OFFICESECONDARY_START,OFFICESECONDARY_END
from [EMP]
inner join OFFICEPRIMARY as op on op.OFFICEPRIMARY_EMP_ID = EMP.EMP_ID
inner join OFFICESECONDARY as os on os.OFFICESECONDARY_EMP_ID = EMP.EMP_ID
The link below should help point you in the right direction to identifying how the dates overlap.
Count days in date range with set of exclusions which may overlap