Count previous row only if > 2 days later - sql

In SQL Server 2012 using Studio: I need results displayed count of distinct clientnumbers (CN) for re-entry, grouped by Type like this:
Type CountOfCN
5 1
10 3
Only a RE-entry counts (ENTRY_NO 1 never counts) and it has to be more than 2 days after the end of the previous entry for that clientnumber. So basically ENTRY_NO 1 doesn't count. ENTRY_NO 2 counts if it's startdate is more than 2 days after the enddate of ENTRY_NO 1, and so on with ENTRY_NO 3, 4, 5.
I got ENTRY_NO by doing a ROW_NUMBER function when I created the table. I have no idea how to go about creating a datediff or dateadd function (?) to look at the previous row's enddate and calculate it with my startdate for each CN?
Here is my table:
CN STARTDATE ENDDATE TYPE ENTRY_NO
1 1/1/2018 1/20/2018 10 1
1 1/21/2018 1/30/2018 5 2
1 2/3/2018 NULL 10 3
2 1/1/2018 1/20/2018 10 1
2 1/27/2018 1/30/2018 10 2
3 1/1/2018 1/20/2018 5 1
3 1/27/2018 1/30/2018 10 2
3 2/10/2018 2/20/2018 5 3
4 1/7/2018 1/30/2018 5 1
5 1/27/2018 1/30/2018 5 1
5 1/31/2018 NULL 5 2
So the rows that should be in the results are ENTRY_NO 2 for CN 1, ENTRY_NO 2 for CN 2, ENTRY_NO 2 & 3 for CN 3.
Only the last Entry may/may not have a NULL enddate

Using the LAG window function you can get the previous enddate.
SELECT *
FROM
(
SELECT * ,
LAG(ENDDATE) OVER (PARTITION BY CN ORDER BY STARTDATE) AS prevEndDate
FROM yourtable
) q
WHERE DATEDIFF(d, prevEndDate, STARTDATE) > 2
AND ENDDATE IS NOT NULL

Inner join the table to itself on the conditions you want to enforce:
Can't be Entry_No 1
The Entry_No on one side is one greater than on the other side
Previous Entry must be more than 2 days earlier
Both sides of the join have the same CN
Use that join to create a CTE or derived table, and then SELECT from it, grouping by Type and getting the COUNT(*)

So this ended up being more involved than I first thought, but here it goes...
You can run this example in SSMS.
Create a table variable matching your definition above:
DECLARE #data TABLE ( CN INT, STARTDATE DATETIME, ENDDATE DATETIME, [TYPE] INT, ENTRY_NO INT );
Insert data given:
INSERT INTO #data ( CN, STARTDATE, ENDDATE, [TYPE], ENTRY_NO ) VALUES
( 1, '1/1/2018', '1/20/2018', 10, 1 )
, ( 1, '1/21/2018', '1/30/2018', 5, 2 )
, ( 1, '2/3/2018', NULL, 10, 3 )
, ( 2, '1/1/2018', '1/20/2018', 10, 1 )
, ( 2, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '1/1/2018', '1/20/2018', 5, 1 )
, ( 3, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '2/10/2018', '2/20/2018', 5, 3 )
, ( 4, '1/7/2018', '1/30/2018', 5, 1 )
, ( 5, '1/27/2018', '1/30/2018', 5, 1 )
, ( 5, '1/31/2018', NULL, 5, 2 );
Confirm inserted data:
+----+-------------------------+-------------------------+------+----------+
| CN | STARTDATE | ENDDATE | TYPE | ENTRY_NO |
+----+-------------------------+-------------------------+------+----------+
| 1 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 1 | 2018-01-21 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 2 |
| 1 | 2018-02-03 00:00:00.000 | NULL | 10 | 3 |
| 2 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 2 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 5 | 1 |
| 3 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-02-10 00:00:00.000 | 2018-02-20 00:00:00.000 | 5 | 3 |
| 4 | 2018-01-07 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-31 00:00:00.000 | NULL | 5 | 2 |
+----+-------------------------+-------------------------+------+----------+
Run SQL to get type count given your business rules:
ENTRY_NO must be greater than 1
Current CN ENDDATE must be greater than 2 days from previous ENDDATE
T-SQL:
SELECT
[TYPE], COUNT( DISTINCT CN ) AS ClientCount
FROM #data
WHERE
CN IN (
SELECT DISTINCT CN FROM (
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
) AS Clients
WHERE
Clients.ENTRY_NO >= 2
AND Clients.DayDiff > 2
)
GROUP BY
[TYPE]
ORDER BY
[TYPE];
Returns:
+------+-------------+
| TYPE | ClientCount |
+------+-------------+
| 5 | 2 |
| 10 | 3 |
+------+-------------+
A quick look at the IN subquery shows us that CNs 1, 2, and 3 will be included during the "TYPE" count.
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
ORDER BY
dat.CN, dat.ENTRY_NO;
+----+----------+------+---------+
| CN | ENTRY_NO | TYPE | DayDiff |
+----+----------+------+---------+
| 1 | 1 | 10 | NULL |
| 1 | 2 | 5 | 10 |
| 1 | 3 | 10 | NULL |
| 2 | 1 | 10 | NULL |
| 2 | 2 | 10 | 10 |
| 3 | 1 | 5 | NULL |
| 3 | 2 | 10 | 10 |
| 3 | 3 | 5 | 21 |
| 4 | 1 | 5 | NULL |
| 5 | 1 | 5 | NULL |
| 5 | 2 | 5 | NULL |
+----+----------+------+---------+

Related

Break periods at the end of the month

SQL Server 2017
CREATE TABLE [TABLE_1]
(
PLAN_NR decimal(28,6) NULL,
START_DATE datetime NULL,
);
INSERT INTO TABLE_1 (PLAN_NR, START_DATE)
VALUES (1,'2020-05-01'), (2,'2020-08-01');
CREATE TABLE [TABLE_2]
(
PLAN_NR decimal(28,6) NULL,
PERIOD_NR decimal(28,6) NOT NULL
);
INSERT INTO TABLE_2 (PLAN_NR, PERIOD_NR)
VALUES (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8);
SQL-FIDDLE-LINK
In TABLE_1 there are plan number and plan start date.
TABLE_2 contains period numbers for each plan number.
I would like to compute the corresponding period start dates:
Each period is exactly 7 days long, unless the period contains a month end. Then the period should be divided into a range before the end of the month up to and including the last day of the month and a range after the end of the month.
The Select:
SELECT
t1.PLAN_NR, t2.PERIOD_NR,
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd') START_DATE
FROM
TABLE_1 t1
JOIN
TABLE_2 t2 ON t1.PLAN_NR = t2.PLAN_NR
ORDER BY
t1.PLAN_NR, t2.PERIOD_NR ASC
This returns the start data but without the extra to consider the respective month end:
+---------+-----------+------------+
| PLAN_NR | PERIOD_NR | START_DATE |
+---------+-----------+------------+
| 1 | 1 | 2020-05-01 |
| 1 | 2 | 2020-05-08 |
| 1 | 3 | 2020-05-15 |
| 1 | 4 | 2020-05-22 |
| 1 | 5 | 2020-05-29 |
| 1 | 6 | 2020-06-05 |
| 1 | 7 | 2020-06-12 |
| 1 | 8 | 2020-06-19 |
| 2 | 1 | 2020-08-05 |
| 2 | 2 | 2020-08-12 |
| 2 | 3 | 2020-08-19 |
| 2 | 4 | 2020-08-26 |
| 2 | 5 | 2020-09-01 |
| 2 | 6 | 2020-09-02 |
| 2 | 7 | 2020-09-09 |
| 2 | 8 | 2020-09-16 |
+---------+-----------+------------+
I would like an output like this:
+---------+-----------+----------------------+
| PLAN_NR | PERIOD_NR | START_DATE |
+---------+-----------+----------------------+
| 1 | 1 | 2020-05-01 |
| 1 | 2 | 2020-05-08 |
| 1 | 3 | 2020-05-15 |
| 1 | 4 | 2020-05-22 |
| 1 | 5 | 2020-05-29 |< --- period part before new month
| 1 | 6 | 2020-06-01 |< --- period part after new month
| 1 | 7 | 2020-06-05 |
| 1 | 8 | 2020-06-12 |
| 2 | 1 | 2020-08-05 |
| 2 | 2 | 2020-08-12 |
| 2 | 3 | 2020-08-19 |
| 2 | 4 | 2020-08-26 |< --- period part before new month
| 2 | 5 | 2020-09-01 |< --- period part after new month
| 2 | 6 | 2020-09-02 |
| 2 | 7 | 2020-09-09 |
| 2 | 8 | 2020-09-16 |
+---------+-----------+----------------------+
Use window functions (LEAD / LAG ) to get the start and end of the period ...
SELECT t1.PLAN_NR
, t2.PERIOD_NR
, FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd') START_DATE
, CASE
WHEN
lead(
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd')
) over (partition by
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM')
order by t2.period_nr)
IS NULL THEN '< --- period part before new month'
WHEN lag(
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd')
) over (partition by
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM')
order by t2.period_nr)
IS NULL THEN '< --- period part after new month'
END as period_break
from TABLE_1 t1
join TABLE_2 t2
on t1.PLAN_NR = t2.PLAN_NR
order by t1.PLAN_NR, t2.PERIOD_NR asc
SQL Fiddle
PLAN_NR PERIOD_NR START_DATE period_break
1 1 2020-05-01 < --- period part after new month
1 2 2020-05-08 (null)
1 3 2020-05-15 (null)
1 4 2020-05-22 (null)
1 5 2020-05-29 < --- period part before new month
1 6 2020-06-05 < --- period part after new month
1 7 2020-06-12 (null)
1 8 2020-06-19 < --- period part before new month
2 1 2020-08-01 < --- period part after new month
2 2 2020-08-08 (null)
2 3 2020-08-15 (null)
2 4 2020-08-22 (null)
2 5 2020-08-29 < --- period part before new month
2 6 2020-09-05 < --- period part after new month
2 7 2020-09-12 (null)
2 8 2020-09-19 < --- period part before new month
SELECT
t1.PLAN_NR, t2.PERIOD_NR,
--row_number() over() but what if PERIOD_NR is not consecutive?
t2.PERIOD_NR + SUM(num.n) OVER(PARTITION BY t2.PLAN_NR ORDER BY t2.PERIOD_NR, num.n) AS PERIOD_NR_x,
FORMAT(CASE WHEN num.n = 1 THEN DATEADD(day, 1, EOMONTH(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ))) ELSE DATEADD(d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ) END, 'yyyy-MM-dd') START_DATE
FROM
TABLE_1 t1
JOIN
TABLE_2 t2 ON t1.PLAN_NR = t2.PLAN_NR
CROSS APPLY
(
SELECT 0 AS n
UNION ALL
--new row for month change
SELECT 1 AS n
WHERE DATEDIFF(month, DATEADD(d ,(t2.PERIOD_NR-1)*7 , t1.START_DATE), DATEADD(d ,t2.PERIOD_NR*7 , t1.START_DATE)) = 1
) as num
ORDER BY
t1.PLAN_NR, t2.PERIOD_NR ASC

SQL Server Get all Birthday Years

I have a table in SQL Server that is Composed of
ID, B_Day
1, 1977-02-20
2, 2001-03-10
...
I want to add rows to this table for each year of a birthday, up to the current birthday year.
i.e:
ID, B_Day
1,1977-02-20
1,1978-02-20
1,1979-02-20
...
1,2020-02-20
2, 2001-03-10
2, 2002-03-10
...
2, 2019-03-10
I'm struggling to determine what the best strategy for accomplishing this. I thought about recursively self-joining, but that creates far too many layers. Any suggestions?
The following should work
with row_gen
as (select top 200 row_number() over(order by name)-1 as rnk
from master..spt_values
)
select a.id,a.b_day,dateadd(year,rnk,b_day) incr_b_day
from dbo.t a
join row_gen b
on dateadd(year,b.rnk,a.b_day)<=getdate()
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=0d06c95e1914ca45ca192d0d192bd2e0
You can use recursive approach :
with cte as (
select t.id, t.b_day, convert(date, getdate()) as mx_dt
from table t
union all
select c.id, dateadd(year, 1, c.b_day), c.mx_dt
from cte c
where dateadd(year, 1, c.b_day) < c.mx_dt
)
select c.id, c.b_day
from cte c
order by c.id, c.b_day;
Default recursion is 100, you can add query hint for more recursion option (maxrecursion 0).
If your dataset is not too big, one option is to use a recursive query:
with cte as (
select id, b_day bday0, b_day, 1 lvl from mytable
union all
select
id,
bday0,
dateadd(year, lvl, bday0), lvl + 1
from cte
where dateadd(year, lvl, bday0) <= getdate()
)
select id, b_day from cte order by id, b_day
Demo on DB Fiddle:
id | b_day
-: | :---------
1 | 1977-02-20
1 | 1978-02-20
1 | 1979-02-20
1 | 1980-02-20
1 | 1981-02-20
1 | 1982-02-20
1 | 1983-02-20
1 | 1984-02-20
1 | 1985-02-20
1 | 1986-02-20
1 | 1987-02-20
1 | 1988-02-20
1 | 1989-02-20
1 | 1990-02-20
1 | 1991-02-20
1 | 1992-02-20
1 | 1993-02-20
1 | 1994-02-20
1 | 1995-02-20
1 | 1996-02-20
1 | 1997-02-20
1 | 1998-02-20
1 | 1999-02-20
1 | 2000-02-20
1 | 2001-02-20
1 | 2002-02-20
1 | 2003-02-20
1 | 2004-02-20
1 | 2005-02-20
1 | 2006-02-20
1 | 2007-02-20
1 | 2008-02-20
1 | 2009-02-20
1 | 2010-02-20
1 | 2011-02-20
1 | 2012-02-20
1 | 2013-02-20
1 | 2014-02-20
1 | 2015-02-20
1 | 2016-02-20
1 | 2017-02-20
1 | 2018-02-20
1 | 2019-02-20
1 | 2020-02-20
2 | 2001-03-01
2 | 2002-03-01
2 | 2003-03-01
2 | 2004-03-01
2 | 2005-03-01
2 | 2006-03-01
2 | 2007-03-01
2 | 2008-03-01
2 | 2009-03-01
2 | 2010-03-01
2 | 2011-03-01
2 | 2012-03-01
2 | 2013-03-01
2 | 2014-03-01
2 | 2015-03-01
2 | 2016-03-01
2 | 2017-03-01
2 | 2018-03-01
2 | 2019-03-01
2 | 2020-03-01

Cleaning up duplicate chronological values

I am trying to clean up some chronological data to remove duplicate chronological data.
Example Table:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | 50 | 2015-05-22 |
| 1 | null | 2015-07-04 |
| 1 | null | 2015-07-24 |
| 1 | null | 2015-07-30 |
| 1 | 50 | 2015-09-07 |
| 1 | 50 | 2016-01-16 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 2 | 60 | 2015-11-22 |
| 2 | 60 | 2016-07-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 50 | 2015-07-15 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
As you can see, the same individual with the same department may have the same department but multiple effective_dates. I want to clean this up with a query to only have the first date for each department change. However, I don't want to remove instances where someone went from department 50 to null then back to 50, as those are actual changes in position.
Example Output:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | null | 2015-07-04 |
| 1 | 50 | 2015-09-07 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
How can I achieve this?
My solution is
DECLARE #myTable TABLE (emp_id INT, department INT, effective_date DATE);
INSERT INTO #myTable VALUES
(1, 50 , '2015-04-01'),
(1, 50 , '2015-05-22'),
(1, null, '2015-07-04'),
(1, null, '2015-07-24'),
(1, null, '2015-07-30'),
(1, 50 , '2015-09-07'),
(1, 50 , '2016-01-16'),
(1, null, '2016-04-23'),
(2, 60 , '2015-01-20'),
(2, 60 , '2015-11-22'),
(2, 60 , '2016-07-20'),
(3, 50 , '2015-04-02'),
(3, 50 , '2015-07-15'),
(3, 60 , '2016-01-25')
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
SELECT T1.emp_id, T1.department, T1.effective_date
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE (CASE WHEN ISNULL(T1.department,'') = ISNULL(T2.department,'') THEN 1 ELSE 0 END) = 0
ORDER BY T1.emp_id, T1.RN
Result:
emp_id department effective_date
----------- ----------- --------------
1 50 2015-04-01
1 NULL 2015-07-04
1 50 2015-09-07
1 NULL 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
(7 row(s) affected)
For delete the duplicate values:
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM #myTable
)
DELETE T1
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE ( CASE
WHEN ISNULL(T1.department,'') <> ISNULL(T2.department,'') THEN 1
ELSE 0 END ) = 0
An alternative for where clause
WHERE ( CASE WHEN T1.department <> T2.department
OR (T1.department IS NULL AND T2.department IS NOT NULL)
OR (T2.department IS NULL AND T1.department IS NOT NULL)
THEN 1 ELSE 0 END ) = 0
This was tougher than expected:
declare #temp as table (emp_id int, department int,effective_date date)
insert into #temp
values
(1,50,'2015-04-01')
, (1,50,'2015-05-22')
, (1, null ,'2015-07-04')
, (1, null ,'2015-07-24')
, (1, null ,'2015-07-30')
, (1,50,'2015-09-07')
, (1,50,'2016-01-16')
, (1, null ,'2016-04-23')
, (2,60,'2015-01-20')
, (2,60,'2015-11-22')
, (2,60,'2016-07-20')
, (3,50,'2015-04-02')
, (3,50,'2015-07-15')
, (3,60,'2016-01-25')
;with cte as
(
--Please not I am changing null to -1 for comparison
select emp_id,isnull(department,-1) department,effective_date
,row_number() over (partition by emp_id order by effective_date) rn
from #temp
)
,cte2 as
(
--Compare to next record
select cte.*
,ctelast.emp_id cte2Emp
,ctelast.department cte2dept
,ctelast.effective_date cte2ED
,isSame = case when cte.department=ctelast.department then 1 else 0 end
from cte
join cte ctelast
on cte.emp_id=ctelast.emp_id and cte.rn = ctelast.rn-1
)
/*
Result of above:
emp_id department effective_date rn cte2Emp cte2dept cte2ED isSame
1 50 2015-04-01 1 1 50 2015-05-22 1
1 50 2015-05-22 2 1 -1 2015-07-04 0
1 -1 2015-07-04 3 1 -1 2015-07-24 1
1 -1 2015-07-24 4 1 -1 2015-07-30 1
1 -1 2015-07-30 5 1 50 2015-09-07 0
1 50 2015-09-07 6 1 50 2016-01-16 1
1 50 2016-01-16 7 1 -1 2016-04-23 0
2 60 2015-01-20 1 2 60 2015-11-22 1
2 60 2015-11-22 2 2 60 2016-07-20 1
3 50 2015-04-02 1 3 50 2015-07-15 1
3 50 2015-07-15 2 3 60 2016-01-25 0
*/
--Now you want both the first record and then any changes
select emp_id,department,effective_date from cte2 where rn=1
union all
select cte2emp,cte2dept,cte2.cte2ED from cte2 where isSame=0
order by 1,3
/*
result:
emp_id department effective_date
1 50 2015-04-01
1 -1 2015-07-04
1 50 2015-09-07
1 -1 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
*/

Weekly Average Reports: Redshift

My Sales data for first two weeks of june, Monday Date i.e 1st Jun , 8th Jun are below
date | count
2015-06-01 03:25:53 | 1
2015-06-01 03:28:51 | 1
2015-06-01 03:49:16 | 1
2015-06-01 04:54:14 | 1
2015-06-01 08:46:15 | 1
2015-06-01 13:14:09 | 1
2015-06-01 16:20:13 | 5
2015-06-01 16:22:13 | 1
2015-06-01 16:27:07 | 1
2015-06-01 16:29:57 | 1
2015-06-01 19:16:45 | 1
2015-06-08 10:54:46 | 1
2015-06-08 15:12:10 | 1
2015-06-08 20:35:40 | 1
I need a find weekly avg of sales happened in a given range .
Complex Query:
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
count(sales_count)
from final_result_set
group by h, dow.
Output :
h | day_of_week | count
3 | 1 | 3
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 8
19 | 1 | 1
20 | 1 | 1
If I try to apply avg on the above final result, It is not actually fetching correct answer!
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
avg(sales_count)
from final_result_set
group by h, dow.
h | day_of_week | count
3 | 1 | 1
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 1
19 | 1 | 1
20 | 1 | 1
So I 've two mondays in the given range, it is not actually dividing by it. I am not even sure what is happening inside redshift.
To get "weekly averages" use date_trunc():
SELECT date_trunc('week', my_date_column) as week
, avg(sales_count) AS avg_sales
FROM final_result_set
GROUP BY 1;
I hope you are not actually using date as name for your date column. It's a reserved word in SQL and a basic type name, don't use it as identifier.
If you group by the day of week (DOW) you get averages per weekday. and sunday is 0. (Use ISODOW to get 7 for Sunday.)

SQL - Grouping with aggregation

I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1
This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')
you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1