Cleaning up duplicate chronological values - sql

I am trying to clean up some chronological data to remove duplicate chronological data.
Example Table:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | 50 | 2015-05-22 |
| 1 | null | 2015-07-04 |
| 1 | null | 2015-07-24 |
| 1 | null | 2015-07-30 |
| 1 | 50 | 2015-09-07 |
| 1 | 50 | 2016-01-16 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 2 | 60 | 2015-11-22 |
| 2 | 60 | 2016-07-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 50 | 2015-07-15 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
As you can see, the same individual with the same department may have the same department but multiple effective_dates. I want to clean this up with a query to only have the first date for each department change. However, I don't want to remove instances where someone went from department 50 to null then back to 50, as those are actual changes in position.
Example Output:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | null | 2015-07-04 |
| 1 | 50 | 2015-09-07 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
How can I achieve this?

My solution is
DECLARE #myTable TABLE (emp_id INT, department INT, effective_date DATE);
INSERT INTO #myTable VALUES
(1, 50 , '2015-04-01'),
(1, 50 , '2015-05-22'),
(1, null, '2015-07-04'),
(1, null, '2015-07-24'),
(1, null, '2015-07-30'),
(1, 50 , '2015-09-07'),
(1, 50 , '2016-01-16'),
(1, null, '2016-04-23'),
(2, 60 , '2015-01-20'),
(2, 60 , '2015-11-22'),
(2, 60 , '2016-07-20'),
(3, 50 , '2015-04-02'),
(3, 50 , '2015-07-15'),
(3, 60 , '2016-01-25')
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
SELECT T1.emp_id, T1.department, T1.effective_date
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE (CASE WHEN ISNULL(T1.department,'') = ISNULL(T2.department,'') THEN 1 ELSE 0 END) = 0
ORDER BY T1.emp_id, T1.RN
Result:
emp_id department effective_date
----------- ----------- --------------
1 50 2015-04-01
1 NULL 2015-07-04
1 50 2015-09-07
1 NULL 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
(7 row(s) affected)
For delete the duplicate values:
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
DELETE T1
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE ( CASE
WHEN ISNULL(T1.department,'') <> ISNULL(T2.department,'') THEN 1
ELSE 0 END ) = 0
An alternative for where clause
WHERE ( CASE WHEN T1.department <> T2.department
OR (T1.department IS NULL AND T2.department IS NOT NULL)
OR (T2.department IS NULL AND T1.department IS NOT NULL)
THEN 1 ELSE 0 END ) = 0

This was tougher than expected:
declare #temp as table (emp_id int, department int,effective_date date)
insert into #temp
values
(1,50,'2015-04-01')
, (1,50,'2015-05-22')
, (1, null ,'2015-07-04')
, (1, null ,'2015-07-24')
, (1, null ,'2015-07-30')
, (1,50,'2015-09-07')
, (1,50,'2016-01-16')
, (1, null ,'2016-04-23')
, (2,60,'2015-01-20')
, (2,60,'2015-11-22')
, (2,60,'2016-07-20')
, (3,50,'2015-04-02')
, (3,50,'2015-07-15')
, (3,60,'2016-01-25')
;with cte as
(
--Please not I am changing null to -1 for comparison
select emp_id,isnull(department,-1) department,effective_date
,row_number() over (partition by emp_id order by effective_date) rn
from #temp
)
,cte2 as
(
--Compare to next record
select cte.*
,ctelast.emp_id cte2Emp
,ctelast.department cte2dept
,ctelast.effective_date cte2ED
,isSame = case when cte.department=ctelast.department then 1 else 0 end
from cte
join cte ctelast
on cte.emp_id=ctelast.emp_id and cte.rn = ctelast.rn-1
)
/*
Result of above:
emp_id department effective_date rn cte2Emp cte2dept cte2ED isSame
1 50 2015-04-01 1 1 50 2015-05-22 1
1 50 2015-05-22 2 1 -1 2015-07-04 0
1 -1 2015-07-04 3 1 -1 2015-07-24 1
1 -1 2015-07-24 4 1 -1 2015-07-30 1
1 -1 2015-07-30 5 1 50 2015-09-07 0
1 50 2015-09-07 6 1 50 2016-01-16 1
1 50 2016-01-16 7 1 -1 2016-04-23 0
2 60 2015-01-20 1 2 60 2015-11-22 1
2 60 2015-11-22 2 2 60 2016-07-20 1
3 50 2015-04-02 1 3 50 2015-07-15 1
3 50 2015-07-15 2 3 60 2016-01-25 0
*/
--Now you want both the first record and then any changes
select emp_id,department,effective_date from cte2 where rn=1
union all
select cte2emp,cte2dept,cte2.cte2ED from cte2 where isSame=0
order by 1,3
/*
result:
emp_id department effective_date
1 50 2015-04-01
1 -1 2015-07-04
1 50 2015-09-07
1 -1 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
*/

Related

Join these 2 tables to get this specific output?

I am using SQL Server 2014. I am working on a query where I want to extract information from 2 specific tables to create my final output.
An extract of the 2 tables (Rebookings and ResaList) are given below.
Rebookings Table (each CancelledID has its corresponding RebookingID):
CancelledID RebookingID
102 541
250 351
129 800
...
ResaList Table:
ID Property ArrivalDate RN
100 X 2020-05-22 9
102 X 2020-03-05 7
250 D 2020-04-12 10
129 E 2020-03-15 8
351 D 2020-09-23 5
541 X 2020-06-01 7
800 E 2020-07-11 8
...
Here is my desired output:
ID Property ArrivalDate RN RebookingID Rebooking_ArrivalDate Rebooking_RN Tag
102 X 2020-03-05 7 541 2020-06-01 7 Cancelled
250 D 2020-04-12 10 351 2020-09-23 5 Re-booked
129 E 2020-03-15 8 800 2020-07-11 8 Re-booked
This is what I have done so far:
USE [MyDatabase]
select
a.[ID],
a.[Property],
a.[Arrival Date],
a.[RN],
b.[RebookingID],
(CASE WHEN a.[ID] in (SELECT [CancelledID] FROM [Rebookings]) THEN 'Re-booked'
ELSE 'Cancelled'
END) as [Tag]
from [ResaList] a
LEFT JOIN [Rebookings] b on b.[CancelledID] = a.[ID]
where a.[ID] in (SELECT [CancelledID] FROM [Rebookings])
GROUP BY a.[ID], a.[Property], a.[ArrivalDate], b.[RebookingID]
I am stuck at how to bring Rebooking_ArrivalDate and Rebooking_RN into the above output. Any help would be appreciated.
I have tried this in SQL Server.
DECLARE #cancelled Table (cancelledId int, rebookingId int)
INSERT into #cancelled values
(102 , 541 ),
( 250 , 351 ),
( 129 , 800 );
DECLARE #ResaList TABLE(Id int, property CHAR(1), ArrivalDate date, RN int)
INSERT INTO #ResaList values
(100 ,'X','2020-05-22', 9),
(102 ,'X','2020-03-05', 7),
(250 ,'D','2020-04-12', 10),
(129 ,'E','2020-03-15', 8),
(351 ,'D','2020-09-23', 5),
(541 ,'X','2020-06-01', 7),
(800 ,'E','2020-07-11', 8);
SELECT r.Id, r.Property, r.ArrivalDate, r.rn,
rebooking.RebookingId
, rebooking.Rebooking_ArrivalDate
, rebooking.Rebooking_RN
, CASE WHEN rebooking.RebookingId IS NOT NULL THEN 'Re-booked' ELSE 'cancelled' end as tag
FROM #ResaList as r
OUTER APPLY (SELECT rc.Id, rc.ArrivalDate, rc.RN
FROM #cancelled as c
INNER JOIN #ResaList AS rc
ON rc.Id = c.rebookingId
WHERE c.cancelledId = r.Id) as rebooking(RebookingId, Rebooking_ArrivalDate, Rebooking_RN)
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+
| Id | Property | ArrivalDate | rn | RebookingId | Rebooking_ArrivalDate | Rebooking_RN | tag |
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+
| 100 | X | 2020-05-22 | 9 | NULL | NULL | NULL | cancelled |
| 102 | X | 2020-03-05 | 7 | 541 | 2020-06-01 | 7 | Re-booked |
| 250 | D | 2020-04-12 | 10 | 351 | 2020-09-23 | 5 | Re-booked |
| 129 | E | 2020-03-15 | 8 | 800 | 2020-07-11 | 8 | Re-booked |
| 351 | D | 2020-09-23 | 5 | NULL | NULL | NULL | cancelled |
| 541 | X | 2020-06-01 | 7 | NULL | NULL | NULL | cancelled |
| 800 | E | 2020-07-11 | 8 | NULL | NULL | NULL | cancelled |
+-----+----------+-------------+----+-------------+-----------------------+--------------+-----------+
Using select in an inner query to find out existing records is very bad way to do this. On larger tables you will suffer a lot.
Here is a compact and efficient way-
select
a.ID,
a.Property,
a.Arrival Date,
a.RN,
b.RebookingID,
b.Rebooking_ArrivalDate,
b.Rebooking_RN,
coalesce(b.tag,'Cancelled')as tag
from ResaList a left join
(
select *, 'Re-booked' as tag from Rebookings
) b
on a.ID=b.CancelledID
You can select the same table twice by using aliases.
SELECT
*
,(CASE WHEN b.[ID] IS NOT NULL THEN 'Re-booked'
ELSE 'Cancelled'
END) as [Tag]
FROM ResaList AS a
LEFT JOIN Rebookings ON a.ID = Rebookings.CancelledID
LEFT JOIN ResaList AS b ON b.ID = Rebookings.RebookingID

SQL Server Get all Birthday Years

I have a table in SQL Server that is Composed of
ID, B_Day
1, 1977-02-20
2, 2001-03-10
...
I want to add rows to this table for each year of a birthday, up to the current birthday year.
i.e:
ID, B_Day
1,1977-02-20
1,1978-02-20
1,1979-02-20
...
1,2020-02-20
2, 2001-03-10
2, 2002-03-10
...
2, 2019-03-10
I'm struggling to determine what the best strategy for accomplishing this. I thought about recursively self-joining, but that creates far too many layers. Any suggestions?
The following should work
with row_gen
as (select top 200 row_number() over(order by name)-1 as rnk
from master..spt_values
)
select a.id,a.b_day,dateadd(year,rnk,b_day) incr_b_day
from dbo.t a
join row_gen b
on dateadd(year,b.rnk,a.b_day)<=getdate()
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=0d06c95e1914ca45ca192d0d192bd2e0
You can use recursive approach :
with cte as (
select t.id, t.b_day, convert(date, getdate()) as mx_dt
from table t
union all
select c.id, dateadd(year, 1, c.b_day), c.mx_dt
from cte c
where dateadd(year, 1, c.b_day) < c.mx_dt
)
select c.id, c.b_day
from cte c
order by c.id, c.b_day;
Default recursion is 100, you can add query hint for more recursion option (maxrecursion 0).
If your dataset is not too big, one option is to use a recursive query:
with cte as (
select id, b_day bday0, b_day, 1 lvl from mytable
union all
select
id,
bday0,
dateadd(year, lvl, bday0), lvl + 1
from cte
where dateadd(year, lvl, bday0) <= getdate()
)
select id, b_day from cte order by id, b_day
Demo on DB Fiddle:
id | b_day
-: | :---------
1 | 1977-02-20
1 | 1978-02-20
1 | 1979-02-20
1 | 1980-02-20
1 | 1981-02-20
1 | 1982-02-20
1 | 1983-02-20
1 | 1984-02-20
1 | 1985-02-20
1 | 1986-02-20
1 | 1987-02-20
1 | 1988-02-20
1 | 1989-02-20
1 | 1990-02-20
1 | 1991-02-20
1 | 1992-02-20
1 | 1993-02-20
1 | 1994-02-20
1 | 1995-02-20
1 | 1996-02-20
1 | 1997-02-20
1 | 1998-02-20
1 | 1999-02-20
1 | 2000-02-20
1 | 2001-02-20
1 | 2002-02-20
1 | 2003-02-20
1 | 2004-02-20
1 | 2005-02-20
1 | 2006-02-20
1 | 2007-02-20
1 | 2008-02-20
1 | 2009-02-20
1 | 2010-02-20
1 | 2011-02-20
1 | 2012-02-20
1 | 2013-02-20
1 | 2014-02-20
1 | 2015-02-20
1 | 2016-02-20
1 | 2017-02-20
1 | 2018-02-20
1 | 2019-02-20
1 | 2020-02-20
2 | 2001-03-01
2 | 2002-03-01
2 | 2003-03-01
2 | 2004-03-01
2 | 2005-03-01
2 | 2006-03-01
2 | 2007-03-01
2 | 2008-03-01
2 | 2009-03-01
2 | 2010-03-01
2 | 2011-03-01
2 | 2012-03-01
2 | 2013-03-01
2 | 2014-03-01
2 | 2015-03-01
2 | 2016-03-01
2 | 2017-03-01
2 | 2018-03-01
2 | 2019-03-01
2 | 2020-03-01

Count previous row only if > 2 days later

In SQL Server 2012 using Studio: I need results displayed count of distinct clientnumbers (CN) for re-entry, grouped by Type like this:
Type CountOfCN
5 1
10 3
Only a RE-entry counts (ENTRY_NO 1 never counts) and it has to be more than 2 days after the end of the previous entry for that clientnumber. So basically ENTRY_NO 1 doesn't count. ENTRY_NO 2 counts if it's startdate is more than 2 days after the enddate of ENTRY_NO 1, and so on with ENTRY_NO 3, 4, 5.
I got ENTRY_NO by doing a ROW_NUMBER function when I created the table. I have no idea how to go about creating a datediff or dateadd function (?) to look at the previous row's enddate and calculate it with my startdate for each CN?
Here is my table:
CN STARTDATE ENDDATE TYPE ENTRY_NO
1 1/1/2018 1/20/2018 10 1
1 1/21/2018 1/30/2018 5 2
1 2/3/2018 NULL 10 3
2 1/1/2018 1/20/2018 10 1
2 1/27/2018 1/30/2018 10 2
3 1/1/2018 1/20/2018 5 1
3 1/27/2018 1/30/2018 10 2
3 2/10/2018 2/20/2018 5 3
4 1/7/2018 1/30/2018 5 1
5 1/27/2018 1/30/2018 5 1
5 1/31/2018 NULL 5 2
So the rows that should be in the results are ENTRY_NO 2 for CN 1, ENTRY_NO 2 for CN 2, ENTRY_NO 2 & 3 for CN 3.
Only the last Entry may/may not have a NULL enddate
Using the LAG window function you can get the previous enddate.
SELECT *
FROM
(
SELECT * ,
LAG(ENDDATE) OVER (PARTITION BY CN ORDER BY STARTDATE) AS prevEndDate
FROM yourtable
) q
WHERE DATEDIFF(d, prevEndDate, STARTDATE) > 2
AND ENDDATE IS NOT NULL
Inner join the table to itself on the conditions you want to enforce:
Can't be Entry_No 1
The Entry_No on one side is one greater than on the other side
Previous Entry must be more than 2 days earlier
Both sides of the join have the same CN
Use that join to create a CTE or derived table, and then SELECT from it, grouping by Type and getting the COUNT(*)
So this ended up being more involved than I first thought, but here it goes...
You can run this example in SSMS.
Create a table variable matching your definition above:
DECLARE #data TABLE ( CN INT, STARTDATE DATETIME, ENDDATE DATETIME, [TYPE] INT, ENTRY_NO INT );
Insert data given:
INSERT INTO #data ( CN, STARTDATE, ENDDATE, [TYPE], ENTRY_NO ) VALUES
( 1, '1/1/2018', '1/20/2018', 10, 1 )
, ( 1, '1/21/2018', '1/30/2018', 5, 2 )
, ( 1, '2/3/2018', NULL, 10, 3 )
, ( 2, '1/1/2018', '1/20/2018', 10, 1 )
, ( 2, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '1/1/2018', '1/20/2018', 5, 1 )
, ( 3, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '2/10/2018', '2/20/2018', 5, 3 )
, ( 4, '1/7/2018', '1/30/2018', 5, 1 )
, ( 5, '1/27/2018', '1/30/2018', 5, 1 )
, ( 5, '1/31/2018', NULL, 5, 2 );
Confirm inserted data:
+----+-------------------------+-------------------------+------+----------+
| CN | STARTDATE | ENDDATE | TYPE | ENTRY_NO |
+----+-------------------------+-------------------------+------+----------+
| 1 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 1 | 2018-01-21 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 2 |
| 1 | 2018-02-03 00:00:00.000 | NULL | 10 | 3 |
| 2 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 2 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 5 | 1 |
| 3 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-02-10 00:00:00.000 | 2018-02-20 00:00:00.000 | 5 | 3 |
| 4 | 2018-01-07 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-31 00:00:00.000 | NULL | 5 | 2 |
+----+-------------------------+-------------------------+------+----------+
Run SQL to get type count given your business rules:
ENTRY_NO must be greater than 1
Current CN ENDDATE must be greater than 2 days from previous ENDDATE
T-SQL:
SELECT
[TYPE], COUNT( DISTINCT CN ) AS ClientCount
FROM #data
WHERE
CN IN (
SELECT DISTINCT CN FROM (
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
) AS Clients
WHERE
Clients.ENTRY_NO >= 2
AND Clients.DayDiff > 2
)
GROUP BY
[TYPE]
ORDER BY
[TYPE];
Returns:
+------+-------------+
| TYPE | ClientCount |
+------+-------------+
| 5 | 2 |
| 10 | 3 |
+------+-------------+
A quick look at the IN subquery shows us that CNs 1, 2, and 3 will be included during the "TYPE" count.
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
ORDER BY
dat.CN, dat.ENTRY_NO;
+----+----------+------+---------+
| CN | ENTRY_NO | TYPE | DayDiff |
+----+----------+------+---------+
| 1 | 1 | 10 | NULL |
| 1 | 2 | 5 | 10 |
| 1 | 3 | 10 | NULL |
| 2 | 1 | 10 | NULL |
| 2 | 2 | 10 | 10 |
| 3 | 1 | 5 | NULL |
| 3 | 2 | 10 | 10 |
| 3 | 3 | 5 | 21 |
| 4 | 1 | 5 | NULL |
| 5 | 1 | 5 | NULL |
| 5 | 2 | 5 | NULL |
+----+----------+------+---------+

Get permutations of ordered sets of N values

I have a table that consists of a set codes for an item. Each code's group is defined by group_id. The table is defined as follows:
CREATE TABLE item_code (
id int PRIMARY KEY NOT NULL IDENTITY (1,1),
item_id int DEFAULT NULL,
group_id int NOT NULL,
code varchar(50) NOT NULL
);
CREATE TABLE groups (
id int PRIMARY KEY NOT NULL IDENTITY (1,1),
name varchar(50) NOT NULL,
order int NOT NULL
)
For each item_id in the table, I need to select 1 code from each group_id ordered by the group's order. For example:
INSERT INTO groups (id, name, order) VALUES (1, 'one', 10), (2, 'two', 20), (3, 'three', 30);
INSERT INTO item_code (item_id, group_id, [code])
VALUES
(99, 1, 'code1-1'),
(99, 1, 'code1-2'),
(99, 2, 'code2-1'),
(99, 2, 'code2-2'),
(99, 3, 'code3-1'),
(100,1, 'another-code');
would result in the set:
item_id code_combination
99 "code1-1"
99 "code1-2"
99 "code2-1"
99 "code2-2"
99 "code3-1"
99 "code1-1, code2-1"
99 "code1-1, code2-2"
99 "code1-2, code2-1"
99 "code1-2, code2-2"
99 "code1-1, code3-1"
99 "code1-2, code3-1"
99 "code2-1, code3-1"
99 "code2-2, code3-1"
99 "code1-1, code2-1, code3-1"
99 "code1-2, code2-1, code3-1"
99 "code1-1, code2-2, code3-1"
99 "code1-2, code2-2, code3-1"
100 "another-code"
The order of the actual results does not matter. I included a row for item_id == 100 just to show that results for all item_id should be included.
What I've done so far:
I've build a CTE that gets combinations of codes, but it does not respect item_id, groups or order and that's where I'm stuck:
;WITH cte ( combination, curr ) AS (
SELECT CAST(ic.code AS VARCHAR(MAX)), ic.id
FROM items_code ic
UNION ALL
SELECT CAST( c.combination + ',' + CAST(ic.code AS VARCHAR(10) ) AS VARCHAR(MAX) ), ic.id
FROM item_code ic
INNER JOIN
cte c
ON ( c.curr < ic.id )
)
SELECT combination FROM cte
UPDATE: I have a slightly more complicated schema than what I originally posted, and have built the schema in this fiddle. The idea is the same, it's just that "order" is defined on a different table.
Adding a little more to your recursive cte, expanding the final join conditions, as well as some additional columns:
;with cte as (
select
ic.id
, ic.item_id
, ic.group_id
, g.[order]
, level = 0
, combination = cast(ic.code as varchar(max))
from item_code ic
inner join groups g
on ic.group_id = g.id
union all
select
ic.id
, ic.item_id
, ic.group_id
, g.[order]
, level = c.level + 1
, combination = cast( c.combination + ',' + cast(ic.code as varchar(10) ) as varchar(max) )
from item_code ic
inner join groups g
on ic.group_id = g.id
inner join cte c
on c.id < ic.id
and c.[order] < g.[order]
and c.item_id = ic.item_id
)
select *
from cte
order by item_id, level, combination
rextester demo: http://rextester.com/PJC44281
returns:
+----+---------+----------+-------+-------+-------------------------+
| id | item_id | group_id | order | level | combination |
+----+---------+----------+-------+-------+-------------------------+
| 1 | 99 | 1 | 10 | 0 | code1-1 |
| 2 | 99 | 1 | 10 | 0 | code1-2 |
| 3 | 99 | 2 | 20 | 0 | code2-1 |
| 4 | 99 | 2 | 20 | 0 | code2-2 |
| 5 | 99 | 3 | 30 | 0 | code3-1 |
| 3 | 99 | 2 | 20 | 1 | code1-1,code2-1 |
| 4 | 99 | 2 | 20 | 1 | code1-1,code2-2 |
| 5 | 99 | 3 | 30 | 1 | code1-1,code3-1 |
| 3 | 99 | 2 | 20 | 1 | code1-2,code2-1 |
| 4 | 99 | 2 | 20 | 1 | code1-2,code2-2 |
| 5 | 99 | 3 | 30 | 1 | code1-2,code3-1 |
| 5 | 99 | 3 | 30 | 1 | code2-1,code3-1 |
| 5 | 99 | 3 | 30 | 1 | code2-2,code3-1 |
| 5 | 99 | 3 | 30 | 2 | code1-1,code2-1,code3-1 |
| 5 | 99 | 3 | 30 | 2 | code1-1,code2-2,code3-1 |
| 5 | 99 | 3 | 30 | 2 | code1-2,code2-1,code3-1 |
| 5 | 99 | 3 | 30 | 2 | code1-2,code2-2,code3-1 |
| 6 | 100 | 1 | 10 | 0 | another-code |
+----+---------+----------+-------+-------+-------------------------+

SQL - Grouping with aggregation

I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1
This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')
you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1