creating complete historical timeline from overlapping intervals - sql

I have below table which contain a code, from, to and hour. The problem is that i have overlapping dates in the intervals. Instead of it i want to create a complete historical timeline. So whe the code is identical and there is a overlap it should sum the hours like in the desired result.
** table **
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 37 |
| 1 | 2013-05-01 | 2014-02-28 | 10 |
| 1 | 2013-10-01 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+
desired result:
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 47 |
| 1 | 2013-10-01 | 2014-02-28 | 15 |
| 1 | 2014-02-29 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+

Oracle Setup:
CREATE TABLE Table1 ( code, "FROM", "TO", hours ) AS
SELECT 1, DATE '2013-05-01', DATE '2013-09-30', 37 FROM DUAL UNION ALL
SELECT 1, DATE '2013-05-01', DATE '2014-02-28', 10 FROM DUAL UNION ALL
SELECT 1, DATE '2013-10-01', DATE '9999-12-31', 5 FROM DUAL;
Query:
SELECT *
FROM (
SELECT code,
dt AS "FROM",
LEAD( dt ) OVER ( PARTITION BY code ORDER BY dt ASC, value DESC, ROWNUM ) AS "TO",
hours
FROM (
SELECT code,
dt,
SUM( hours * value ) OVER ( PARTITION BY code ORDER BY dt ASC, VALUE DESC ) AS hours,
value
FROM table1
UNPIVOT ( dt FOR value IN ( "FROM" AS 1, "TO" AS -1 ) )
)
)
WHERE "FROM" + 1 < "TO";
Results:
CODE FROM TO HOURS
---- ---------- ---------- -----
1 2013-05-01 2013-09-30 47
1 2013-10-01 2014-02-28 15
1 2014-02-28 9999-12-31 5

Related

Oracle SQL: How can I sum every x number of subsequent rows for each row

I have a data table that looks like this:
|Contract Date | Settlement_Prcie |
|--------------|------------------|
| 01/10/2020 | 50 |
|--------------|------------------|
| 01/11/2020 | 10 |
|--------------|------------------|
| 01/01/2021 | 20 |
|--------------|------------------|
| 01/02/2021 | 30 |
|--------------|------------------|
| 01/03/2021 | 50 |
|--------------|------------------|
I would like to write a query that sums every two rows beneath ... For example, On the first row with contract date 01/10/2020, the sum column would add 10 and 20 to give a result of 30. The next row, the sum column would add 20 and 30 to give 40 and so on. The resulting table of results would look like this:
|Contract Date | Settlement_Prcie | Sum Column |
|--------------|------------------|------------|
| 01/10/2020 | 50 | 30
|--------------|------------------|------------|
| 01/11/2020 | 10 | 50
|--------------|------------------|------------|
| 01/01/2021 | 20 | 80
|--------------|------------------|------------|
| 01/02/2021 | 30 |
|--------------|------------------|
| 01/03/2021 | 50 |
|--------------|------------------|
Could anyone please help me with the query to do this not just for 2 subsequent rows but for x subsequent rows.
So far I had tried using a SUM (Settlement_Price) Over (order by Contract_date Rows between 3 preceding and current row) - Current row of course was not ok, but that is as far as I had gone.
You can use the SUM analytic function:
SELECT contract_date,
settlement_price,
CASE COUNT(*) OVER (
ORDER BY contract_date ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING
)
WHEN 2
THEN SUM( settlement_price ) OVER (
ORDER BY contract_date ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING
)
END AS sum_column
FROM table_name;
Or you can use LEAD:
SELECT contract_date,
settlement_price,
LEAD( settlement_price, 1 , NULL ) OVER ( ORDER BY contract_date )
+ LEAD( settlement_price, 2 , NULL ) OVER ( ORDER BY contract_date )
AS sum_column
FROM table_name;
So, for the test data:
CREATE TABLE table_name ( contract_date, settlement_price ) AS
SELECT DATE '2020-10-01', 50 FROM DUAL UNION ALL
SELECT DATE '2020-11-01', 10 FROM DUAL UNION ALL
SELECT DATE '2020-12-01', 20 FROM DUAL UNION ALL
SELECT DATE '2021-01-01', 30 FROM DUAL UNION ALL
SELECT DATE '2021-02-01', 50 FROM DUAL;
Both queries output:
CONTRACT_DATE | SETTLEMENT_PRICE | SUM_COLUMN
:------------ | ---------------: | ---------:
01-OCT-20 | 50 | 30
01-NOV-20 | 10 | 50
01-DEC-20 | 20 | 80
01-JAN-21 | 30 | null
01-FEB-21 | 50 | null
db<>fiddle here
SUM (Settlement_Price) Over (order by Contract_date Rows between 1 following and 2 following)

Get value for the first date in month

I have weekly data of each product stock. I want to group it by year-month and get the first value of each month. In other words, I want to get the opening stock of each month, regardless the day of the month.
+------------+---------+
| MyDate | MyValue |
+------------+---------+
| 2018-01-06 | 2 |*
| 2018-01-13 | 7 |
| 2018-01-20 | 5 |
| 2018-01-27 | 2 |
| 2018-02-03 | 3 |*
| 2018-02-10 | 10 |
| 2018-02-17 | 6 |
| 2018-02-24 | 4 |
| 2018-03-03 | 7 |*
| 2018-03-10 | 5 |
| 2018-03-17 | 3 |
| 2018-03-24 | 4 |
| 2018-03-31 | 6 |
+------------+---------+
Desired results:
+----------------+---------+
| FirstDayOfMonth| MyValue |
+----------------+---------+
| 2018-01-01 | 2 |
| 2018-02-01 | 3 |
| 2018-03-01 | 7 |
+----------------+---------+
I thought this might work, but it ain't.
select
[product],
datefromparts(year([MyDate]), month([MyDate]), 1),
FIRST_VALUE(MyValue) OVER (PARTITION BY [Product], YEAR([MyDate]), MONTH([MyDate]) ORDER BY [MyDate] ASC) AS MyValue
from
MyTable
group by
[Product],
YEAR([MyDate]), MONTH([MyDate])
Edit. Thank you. The accent in my question is not how to get the first day of the month. I know that there are different techniques for that.
The accent is how to get the FIRST value in month (the opening stock). If there is a chance to get the closing stock in one shot - it would be great. The answers based on ROW_NUMBER do not allow to get closing stock in one shot, would require two joins.
Edit after accepting answer
Please consider John Cappelletti's answer as an alternative to the accepted one: https://stackoverflow.com/a/53559750/1903793
You don't really need the GROUP BY if you have chosen the window function route:
SELECT Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) AS Month, MyValue
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Product, DATEADD(DAY, 1, EOMONTH(MyDate, -1)) ORDER BY MyDate) AS rn
FROM t
) AS x
WHERE rn = 1
UPDATE
To get the last row for the month just do a UNION ALL <above query> but change the order by clause to ORDER BY MyDate DESC. This will give you two rows per product-month.
You can use apply & eomonth to find the last day of month & add one day :
select distinct dateadd(day, 1, eomonth(t1.mydate, -1)) as FistDayOfMonth, t1.myvalue
from table t cross apply
( select top (1) t1.mydate, t1.myvalue
from table t1
where t1.product = t.product and
year(t1.MyDate) = year(t.MyDate) and month(t1.MyDate) = month(t.MyDate)
order by t1.mydate
) t1;
Could also use a rowNumber and cte.
DEMO
WITH CTE as (
SELECT '2018-01-06' myDate, 2 Myvalue UNION ALL
SELECT '2018-01-13', 7 UNION ALL
SELECT '2018-01-20', 5 UNION ALL
SELECT '2018-01-27', 2 UNION ALL
SELECT '2018-02-03', 3 UNION ALL
SELECT '2018-02-10', 10 UNION ALL
SELECT '2018-02-17', 6 UNION ALL
SELECT '2018-02-24', 4 UNION ALL
SELECT '2018-03-03', 7 UNION ALL
SELECT '2018-03-10', 5 UNION ALL
SELECT '2018-03-17', 3 UNION ALL
SELECT '2018-03-24', 4 UNION ALL
SELECT '2018-03-31', 6),
CTE2 as (SELECT *
, Row_Number() over (partition by DATEADD(month, DATEDIFF(month, 0, MyDate), 0) order by myDate) RN
FROM CTE)
SELECT DATEADD(month, DATEDIFF(month, 0, MyDate), 0), MyValue
FROM cte2
WHERE RN = 1
Giving us:
+----+---------------------+---------+
| | (No column name) | MyValue |
+----+---------------------+---------+
| 1 | 01.01.2018 00:00:00 | 2 |
| 2 | 01.02.2018 00:00:00 | 3 |
| 3 | 01.03.2018 00:00:00 | 7 |
+----+---------------------+---------+
Just another option is using the WITH TIES, and then a little cheat for the date
Example
Select top 1 with ties
MyDate = convert(varchar(7),MyDate,120)+'-01'
,MyValue
from YourTable
Order By Row_Number() over (Partition By convert(varchar(7),MyDate,120) Order By MyDate)
Returns
MyDate MyValue
2018-01-01 2
2018-02-01 3
2018-03-01 7

How to return counts in a lookback period, unique to multiple fields?

Here is a sample of the dataset I have (~10 TB)
+----+------------+----------+----------------+--------------+
| id | date | campaign | campaign_start | campaign_end |
+----+------------+----------+----------------+--------------+
| 1 | 2018-01-01 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-01 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-02 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-01-23 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
I want to:
For every unique id + campaign:
Get the frequency of occurrences of an id within the period of that specific campaign
Get the frequency of occurrences of an id within a variable lookback period (say 3 months) before the start of the campaign. Say " >= campaign_start + 3 months "
Get the earliest (first) and latest (last) date in that window
What I would like the output to be is:
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| id | campaign | campaign_frequency | total_lookback_frequency | campaign_start | campaign_end | first_date | last_date |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-01 | 2018-01-01 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 2 | 3 | 4 | 2018-02-01 | 2018-02-03 | 2018-01-01 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-23 | 2018-01-23 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 2 | 1 | 2 | 2018-02-01 | 2018-02-03 | 2018-01-23 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
The problem I have been having is that I can't get the total_lookback_frequency to work properly, it always returns the same result as campaign_frequency (which is just a count(id) group by id, campaign.
Below is what I had (that isn't working):
SELECT
id,
campaign,
min(date) as first_date,
max(date) as end_date,
count(id) as total_lookback_frequency,
WHERE
date >= sub(date, INTERVAL 730 hour)
GROUP BY
id,
campaign,
date
Would you be able to help out here?
Thanks!
Below is for BigQuery Standard SQL
#standardSQL
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
You can test, play with above using dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2018-01-01' dt, 1 campaign, DATE '2018-01-01' campaign_start, DATE '2018-02-03' campaign_end UNION ALL
SELECT 1, '2018-02-01', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-02', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-03', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 2, '2018-01-23', 1, '2018-01-01', '2018-02-03' UNION ALL
SELECT 2, '2018-02-03', 2, '2018-02-01', '2018-02-03'
)
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
-- ORDER BY id, campaign

Split rows on different days if summing hours value to given day exceeds midnight

I have a structure like this
+-----+-----+------------+----------+------+----------------------+---+
| Row | id | date | time | hour | description | |
+-----+-----+------------+----------+------+----------------------+---+
| 1 | foo | 2018-03-02 | 19:00:00 | 8 | across single day | |
| 2 | bar | 2018-03-02 | 23:00:00 | 1 | end at midnight | |
| 3 | qux | 2018-03-02 | 10:00:00 | 3 | inside single day | |
| 4 | quz | 2018-03-02 | 23:15:00 | 2 | with minutes | |
+-----+-----+------------+----------+------+----------------------+---+
(I added the description column only to understand the context, for analysis purpose is useless)
Here is the statement to generate table
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
UNION ALL
SELECT "quz", CURRENT_dATE(), TIME(23,15,0), 2
)
SELECT * FROM table
Adding the hour value to the given time, I need to split the row on multiple ones, if the sum goes on the next day.
Jumps on multiple days are NOT to be considered, like +27 hours (this should simplify the scenario)
My initial idea was starting from adding the hours value in a date field, in order to obtain start and end limits of the interval
SELECT
id,
DATETIME(date, time) AS date_start,
DATETIME_ADD(DATETIME(date, time), INTERVAL hour HOUR) AS date_end
FROM table
here is the result
+-----+-----+---------------------+---------------------+---+
| Row | id | date_start | date_end | |
+-----+-----+---------------------+---------------------+---+
| 1 | foo | 2018-03-02T19:00:00 | 2018-03-03T03:00:00 | |
| 2 | bar | 2018-03-02T23:00:00 | 2018-03-03T00:00:00 | |
| 3 | qux | 2018-03-02T10:00:00 | 2018-03-02T13:00:00 | |
| 4 | quz | 2018-03-02T23:15:00 | 2018-03-03T01:15:00 | |
+-----+-----+---------------------+---------------------+---+
but now I'm stuck on how to proceed considering the existing interval.
Starting from this table, the rows should be splitted if the day change, like
+-----+-----+------------+-------------+----------+-------+--+
| Row | id | date | hourt_start | hour_end | hours | |
+-----+-----+------------+-------------+----------+-------+--+
| 1 | foo | 2018-03-02 | 19:00:00 | 00:00:00 | 5 | |
| 2 | foo | 2018-03-03 | 00:00:00 | 03:00:00 | 3 | |
| 3 | bar | 2018-03-02 | 23:00:00 | 00:00:00 | 1 | |
| 4 | qux | 2018-03-02 | 10:00:00 | 13:00:00 | 3 | |
| 5 | quz | 2018-03-02 | 23:15:00 | 00:00:00 | 0.75 | |
| 6 | quz | 2018-03-03 | 00:00:00 | 01:15:00 | 1.25 | |
+-----+-----+------------+-------------+----------+-------+--+
I tried to study a similar scenario from an already analyzed scenario, but I was unable to adapt it for handling the day component as well.
My whole final scenario will include both this approach and the other one analyzed in the other question (split on single days and then split on given breaks of hours), but I can approach these 2 themes separately, first query split with day (this question) and then split on time breaks (other question)
Interesting problem ... I tried the following:
Create a second table creating all the new rows starting at midnight
UNION ALL it with source table while correcting hours of old rows accordingly
Commented Result:
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
)
,table2 AS (
SELECT
id,
-- create datetime, add hours, then cast as date again
CAST( datetime_add( datetime(date, time), INTERVAL hour HOUR) AS date) date,
time(0,0,0) AS time -- losing minutes and seconds
-- substract hours to midnight
,hour - (24-EXTRACT(HOUR FROM time)) hour
FROM
table
WHERE
date != CAST( datetime_add( datetime(date,time), INTERVAL hour HOUR) AS date) )
SELECT
id
,date
,time
-- correct hour if midnight split
,IF(EXTRACT(hour from time)+hour > 24,24-EXTRACT(hour from time),hour) hour
FROM
table
UNION ALL
SELECT
*
FROM
table2
Hope, it makes sense.
Of course, if you need to consider jumps over multiple days, the correction fails :)
Here a possibile solution I came up starting from #Martin Weitzmann approach.
I used 2 different ways:
ids where there is a "jump" on the day
ids which are in the same day
and a final UNION ALL of the two data
I forgot to mention the first time that the hours value of the input value can be float (portion of hours) so I added that too.
#standardSQL
WITH
input AS (
-- change of day
SELECT "bap" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time, 8.0 AS hour UNION ALL
-- end at midnight
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1.0 UNION ALL
-- inside single day
SELECT "foo", CURRENT_dATE(), TIME(10,0,0), 3.0 UNION ALL
-- change of day with minutes and float hours
SELECT "qux", CURRENT_dATE(), TIME(23,15,0), 2.5 UNION ALL
-- start from midnight
SELECT "quz",CURRENT_dATE(), TIME(0,0,0), 4.5
),
-- Calculate end_date and end_time summing hours value
table AS (
SELECT
id,
date AS start_date,
time AS start_time,
EXTRACT(DATE FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_date,
EXTRACT(TIME FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_time
FROM input
),
-- portion that start from start_time and end at midnight
start_to_midnight AS (
SELECT
id,
start_time,
start_date,
TIME(23,59,59) as end_time,
start_date as end_date
FROM
table
WHERE end_date > start_date
),
-- portion that start from midnightand end at end_time
midnight_to_end AS (
SELECT
id,
TIME(0,0,0) as start_time,
end_date as start_date,
end_time,
end_date
FROM
table
WHERE
end_date > start_date
-- Avoid rows that starts from 0:0:0 and ends to 0:0:0 (original row ends at 0:0:0)
AND end_time != TIME(0,0,0)
)
-- Union of the 3 tables
SELECT
id,
start_date,
start_time,
end_time
FROM (
SELECT id, start_time, end_time, start_date FROM table WHERE start_date = end_date
UNION ALL
SELECT id, start_time, end_time, start_date FROM start_to_midnight
UNION ALL
SELECT id, start_time, end_time, start_date FROM midnight_to_end
)
ORDER BY id,start_date,start_time
Here is the provided output
+-----+-----+------------+------------+----------+---+
| Row | id | start_date | start_time | end_time | |
+-----+-----+------------+------------+----------+---+
| 1 | bap | 2018-03-03 | 19:00:00 | 23:59:59 | |
| 2 | bap | 2018-03-04 | 00:00:00 | 03:00:00 | |
| 3 | bar | 2018-03-03 | 23:00:00 | 23:59:59 | |
| 4 | foo | 2018-03-03 | 10:00:00 | 13:00:00 | |
| 5 | qux | 2018-03-03 | 23:15:00 | 23:59:59 | |
| 6 | qux | 2018-03-04 | 00:00:00 | 01:45:00 | |
| 7 | quz | 2018-03-03 | 00:00:00 | 04:30:00 | |
+-----+-----+------------+------------+----------+---+

SQL - Grouping with aggregation

I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1
This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')
you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1