Break periods at the end of the month - sql

SQL Server 2017
CREATE TABLE [TABLE_1]
(
PLAN_NR decimal(28,6) NULL,
START_DATE datetime NULL,
);
INSERT INTO TABLE_1 (PLAN_NR, START_DATE)
VALUES (1,'2020-05-01'), (2,'2020-08-01');
CREATE TABLE [TABLE_2]
(
PLAN_NR decimal(28,6) NULL,
PERIOD_NR decimal(28,6) NOT NULL
);
INSERT INTO TABLE_2 (PLAN_NR, PERIOD_NR)
VALUES (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8),
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8);
SQL-FIDDLE-LINK
In TABLE_1 there are plan number and plan start date.
TABLE_2 contains period numbers for each plan number.
I would like to compute the corresponding period start dates:
Each period is exactly 7 days long, unless the period contains a month end. Then the period should be divided into a range before the end of the month up to and including the last day of the month and a range after the end of the month.
The Select:
SELECT
t1.PLAN_NR, t2.PERIOD_NR,
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd') START_DATE
FROM
TABLE_1 t1
JOIN
TABLE_2 t2 ON t1.PLAN_NR = t2.PLAN_NR
ORDER BY
t1.PLAN_NR, t2.PERIOD_NR ASC
This returns the start data but without the extra to consider the respective month end:
+---------+-----------+------------+
| PLAN_NR | PERIOD_NR | START_DATE |
+---------+-----------+------------+
| 1 | 1 | 2020-05-01 |
| 1 | 2 | 2020-05-08 |
| 1 | 3 | 2020-05-15 |
| 1 | 4 | 2020-05-22 |
| 1 | 5 | 2020-05-29 |
| 1 | 6 | 2020-06-05 |
| 1 | 7 | 2020-06-12 |
| 1 | 8 | 2020-06-19 |
| 2 | 1 | 2020-08-05 |
| 2 | 2 | 2020-08-12 |
| 2 | 3 | 2020-08-19 |
| 2 | 4 | 2020-08-26 |
| 2 | 5 | 2020-09-01 |
| 2 | 6 | 2020-09-02 |
| 2 | 7 | 2020-09-09 |
| 2 | 8 | 2020-09-16 |
+---------+-----------+------------+
I would like an output like this:
+---------+-----------+----------------------+
| PLAN_NR | PERIOD_NR | START_DATE |
+---------+-----------+----------------------+
| 1 | 1 | 2020-05-01 |
| 1 | 2 | 2020-05-08 |
| 1 | 3 | 2020-05-15 |
| 1 | 4 | 2020-05-22 |
| 1 | 5 | 2020-05-29 |< --- period part before new month
| 1 | 6 | 2020-06-01 |< --- period part after new month
| 1 | 7 | 2020-06-05 |
| 1 | 8 | 2020-06-12 |
| 2 | 1 | 2020-08-05 |
| 2 | 2 | 2020-08-12 |
| 2 | 3 | 2020-08-19 |
| 2 | 4 | 2020-08-26 |< --- period part before new month
| 2 | 5 | 2020-09-01 |< --- period part after new month
| 2 | 6 | 2020-09-02 |
| 2 | 7 | 2020-09-09 |
| 2 | 8 | 2020-09-16 |
+---------+-----------+----------------------+

Use window functions (LEAD / LAG ) to get the start and end of the period ...
SELECT t1.PLAN_NR
, t2.PERIOD_NR
, FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd') START_DATE
, CASE
WHEN
lead(
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd')
) over (partition by
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM')
order by t2.period_nr)
IS NULL THEN '< --- period part before new month'
WHEN lag(
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM-dd')
) over (partition by
FORMAT(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ),'yyyy-MM')
order by t2.period_nr)
IS NULL THEN '< --- period part after new month'
END as period_break
from TABLE_1 t1
join TABLE_2 t2
on t1.PLAN_NR = t2.PLAN_NR
order by t1.PLAN_NR, t2.PERIOD_NR asc
SQL Fiddle
PLAN_NR PERIOD_NR START_DATE period_break
1 1 2020-05-01 < --- period part after new month
1 2 2020-05-08 (null)
1 3 2020-05-15 (null)
1 4 2020-05-22 (null)
1 5 2020-05-29 < --- period part before new month
1 6 2020-06-05 < --- period part after new month
1 7 2020-06-12 (null)
1 8 2020-06-19 < --- period part before new month
2 1 2020-08-01 < --- period part after new month
2 2 2020-08-08 (null)
2 3 2020-08-15 (null)
2 4 2020-08-22 (null)
2 5 2020-08-29 < --- period part before new month
2 6 2020-09-05 < --- period part after new month
2 7 2020-09-12 (null)
2 8 2020-09-19 < --- period part before new month

SELECT
t1.PLAN_NR, t2.PERIOD_NR,
--row_number() over() but what if PERIOD_NR is not consecutive?
t2.PERIOD_NR + SUM(num.n) OVER(PARTITION BY t2.PLAN_NR ORDER BY t2.PERIOD_NR, num.n) AS PERIOD_NR_x,
FORMAT(CASE WHEN num.n = 1 THEN DATEADD(day, 1, EOMONTH(DATEADD (d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ))) ELSE DATEADD(d ,((t2.PERIOD_NR-1)*7) , t1.START_DATE ) END, 'yyyy-MM-dd') START_DATE
FROM
TABLE_1 t1
JOIN
TABLE_2 t2 ON t1.PLAN_NR = t2.PLAN_NR
CROSS APPLY
(
SELECT 0 AS n
UNION ALL
--new row for month change
SELECT 1 AS n
WHERE DATEDIFF(month, DATEADD(d ,(t2.PERIOD_NR-1)*7 , t1.START_DATE), DATEADD(d ,t2.PERIOD_NR*7 , t1.START_DATE)) = 1
) as num
ORDER BY
t1.PLAN_NR, t2.PERIOD_NR ASC

Related

Generate multiple record from existing records based on interval columns [from and to]

I have 2 types of score [M,B] in column 3, if a type is M, then the score is either an S[scored] or SB[bonus scored] in column 6. Every interval [from_hrs - to_hrs] for a type B must have a corresponding SB for type M, thus, an interval for a type B cannot have a score of S for a type M. I have several records that were unfortunately captured as seen in the table below.
CREATE TABLE SCORE_TBL
(
ID int IDENTITY(1,1) PRIMARY KEY,
PERSONID_FK int NOT NULL,
S_TYPE varchar(50) NULL,
FROM_HRS int NULL,
TO_HRS int NULL,
SCORE varchar(50) NULL,
);
INSERT INTO SCORE_TBL(PERSONID_FK,S_TYPE,FROM_HRS,TO_HRS,SCORE)
VALUES
(1, 'M' , 0,20, 'S'),
(1, 'B',6, 8, 'B'),
(2, 'B',0, 2, 'B'),
(2, 'M',0,20, 'S'),
(2, 'B', 10,13, 'B'),
(2, 'B', 18,20, 'B'),
(2, 'M', 13,18, 'S');
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 20 | S |
| 2 | 1 | B | 6 | 8 | B |
| 3 | 2 | B | 0 | 2 | B |
| 4 | 2 | M | 0 | 20 | S |
| 5 | 2 | B | 10 | 13 | B |
| 6 | 2 | B | 18 | 20 | B |
| 7 | 2 | M | 13 | 18 | S |
I want the data to look like this
| ID | PERSONID_FK |S_TYPE| FROM_HRS | TO_HRS | SCORE |
|----|-------------|------|----------|--------|-------|
| 1 | 1 | M | 0 | 6 | S |
| 2 | 1 | M | 6 | 8 | SB |
| 3 | 1 | B | 6 | 8 | B |
| 4 | 1 | M | 8 | 20 | S |
| 5 | 2 | B | 0 | 2 | B |
| 6 | 2 | M | 0 | 2 | SB |
| 7 | 2 | M | 2 | 10 | S |
| 8 | 2 | B | 10 | 13 | B |
| 9 | 2 | M | 10 | 13 | SB |
| 10 | 2 | M | 13 | 18 | S |
| 11 | 2 | B | 18 | 20 | B |
| 12 | 2 | S | 18 | 20 | SB |
Any ideas on how to generate this data in SQL Server select statement? Visually, this what am trying to get.
Tricky part here is that interval might need to be split in several pieces like 0..20 for person 2.
Window functions to the rescue. This query illustrates what you need to do:
WITH
deltas AS (
SELECT personid_fk, hrs, sum(delta_s) as delta_s, sum(delta_b) as delta_b
FROM (SELECT personid_fk, from_hrs as hrs,
case when score = 'S' then 1 else 0 end as delta_s,
case when score = 'B' then 1 else 0 end as delta_b
FROM score_tbl
UNION ALL
SELECT personid_fk, to_hrs as hrs,
case when score = 'S' then -1 else 0 end as delta_s,
case when score = 'B' then -1 else 0 end as delta_b
FROM score_tbl) _
GROUP BY personid_fk, hrs
),
running AS (
SELECT personid_fk, hrs as from_hrs,
lead(hrs) over (partition by personid_fk order by hrs) as to_hrs,
sum(delta_s) over (partition by personid_fk order by hrs) running_s,
sum(delta_b) over (partition by personid_fk order by hrs) running_b
FROM deltas
)
SELECT personid_fk, 'M' as s_type, from_hrs, to_hrs,
case when running_b > 0 then 'SB' else 'S' end as score
FROM running
WHERE running_s > 0
UNION ALL
SELECT personid_fk, s_type, from_hrs, to_hrs, score
FROM score_tbl
WHERE s_type = 'B'
ORDER BY personid_fk, from_hrs;
Step by step:
deltas is union of two passes on score_tbl - one for start and one for end of score/bonus interval, creating a timeline of +1/-1 events
running calculates running total of deltas over time, yielding split intervals where score/bonus are active
final query just converts score codes and unions bonus intervals (which are passed unchanged)
SQL Fiddle here.

SQL Server Get all Birthday Years

I have a table in SQL Server that is Composed of
ID, B_Day
1, 1977-02-20
2, 2001-03-10
...
I want to add rows to this table for each year of a birthday, up to the current birthday year.
i.e:
ID, B_Day
1,1977-02-20
1,1978-02-20
1,1979-02-20
...
1,2020-02-20
2, 2001-03-10
2, 2002-03-10
...
2, 2019-03-10
I'm struggling to determine what the best strategy for accomplishing this. I thought about recursively self-joining, but that creates far too many layers. Any suggestions?
The following should work
with row_gen
as (select top 200 row_number() over(order by name)-1 as rnk
from master..spt_values
)
select a.id,a.b_day,dateadd(year,rnk,b_day) incr_b_day
from dbo.t a
join row_gen b
on dateadd(year,b.rnk,a.b_day)<=getdate()
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=0d06c95e1914ca45ca192d0d192bd2e0
You can use recursive approach :
with cte as (
select t.id, t.b_day, convert(date, getdate()) as mx_dt
from table t
union all
select c.id, dateadd(year, 1, c.b_day), c.mx_dt
from cte c
where dateadd(year, 1, c.b_day) < c.mx_dt
)
select c.id, c.b_day
from cte c
order by c.id, c.b_day;
Default recursion is 100, you can add query hint for more recursion option (maxrecursion 0).
If your dataset is not too big, one option is to use a recursive query:
with cte as (
select id, b_day bday0, b_day, 1 lvl from mytable
union all
select
id,
bday0,
dateadd(year, lvl, bday0), lvl + 1
from cte
where dateadd(year, lvl, bday0) <= getdate()
)
select id, b_day from cte order by id, b_day
Demo on DB Fiddle:
id | b_day
-: | :---------
1 | 1977-02-20
1 | 1978-02-20
1 | 1979-02-20
1 | 1980-02-20
1 | 1981-02-20
1 | 1982-02-20
1 | 1983-02-20
1 | 1984-02-20
1 | 1985-02-20
1 | 1986-02-20
1 | 1987-02-20
1 | 1988-02-20
1 | 1989-02-20
1 | 1990-02-20
1 | 1991-02-20
1 | 1992-02-20
1 | 1993-02-20
1 | 1994-02-20
1 | 1995-02-20
1 | 1996-02-20
1 | 1997-02-20
1 | 1998-02-20
1 | 1999-02-20
1 | 2000-02-20
1 | 2001-02-20
1 | 2002-02-20
1 | 2003-02-20
1 | 2004-02-20
1 | 2005-02-20
1 | 2006-02-20
1 | 2007-02-20
1 | 2008-02-20
1 | 2009-02-20
1 | 2010-02-20
1 | 2011-02-20
1 | 2012-02-20
1 | 2013-02-20
1 | 2014-02-20
1 | 2015-02-20
1 | 2016-02-20
1 | 2017-02-20
1 | 2018-02-20
1 | 2019-02-20
1 | 2020-02-20
2 | 2001-03-01
2 | 2002-03-01
2 | 2003-03-01
2 | 2004-03-01
2 | 2005-03-01
2 | 2006-03-01
2 | 2007-03-01
2 | 2008-03-01
2 | 2009-03-01
2 | 2010-03-01
2 | 2011-03-01
2 | 2012-03-01
2 | 2013-03-01
2 | 2014-03-01
2 | 2015-03-01
2 | 2016-03-01
2 | 2017-03-01
2 | 2018-03-01
2 | 2019-03-01
2 | 2020-03-01

How to get record by date from 1 table and update other table in Postgresql?

I have two tables. In one table(order_produt) have multiple records by date and other table(Transfer_product) also multiple record by date. order_product table have correct record. i want update my transfer_product table with order_product table by date range.
order_product_table
-------------------------
id | date | Product_id | value
-------------------------------------------
1 | 2017-07-01 | 2 | 53
2 | 2017-08-05 | 2 | 67
3 | 2017-10-02 | 2 | 83
4 | 2018-01-20 | 5 | 32
5 | 2018-05-01 | 5 | 53
6 | 2008-08-05 | 6 | 67
Transfer_product_table
----------------------------
id | date | Product_id | value
--------------------------------------------
1 | 2017-08-01 | 2 | 10
2 | 2017-10-06 | 2 | 20
3 | 2017-12-12 | 2 | 31
4 | 2018-06-25 | 5 | 5
Result(Transfer_product_table)
--------------------------------
id | date | Product_id | value
--------------------------------------------
1 | 2017-08-01 | 2 | 53
2 | 2017-10-06 | 2 | 83
3 | 2017-12-12 | 2 | 83
4 | 2018-06-25 | 5 | 53
I want by date value update like you can see Result table.
i use query partion by but this is not what i want.
UPDATE Transfer_product_table imp
SET value = sub.value
FROM (SELECT product_id,value
,ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY orderdate DESC)AS Rno
FROM order_product_table
where orderdate between '2017-07-01' and '2019-10-31') sub
WHERE imp.product_id = sub.product_id
and sub.Rno=1
and imp.date between '2017-07-01' and '2019-10-31'
This is pretty straightforward using postgres' awesome daterange type.
with order_product_table as (
select * from (
VALUES (1, '2017-07-01'::date, 2, 53),
(2, '2017-08-05', 2, 67),
(3, '2017-10-02', 2, 83),
(4, '2018-01-20', 5, 32),
(5, '2018-05-01', 5, 53),
(6, '2008-08-05', 6, 67)
) v(id, date, product_id, value)
), transfer_product_table as (
select * from (
VALUES (1, '2017-08-01'::date, 2, 10),
(2, '2017-10-06', 2, 20),
(3, '2017-12-12', 2, 31),
(4, '2018-06-25', 5, 5)
) v(id, date, product_id, value)
), price_ranges AS (
select product_id,
daterange(date, lead(date) OVER (PARTITION BY product_id order by date), '[)') as pricerange,
value
FROM order_product_table
)
SELECT id,
date,
transfer_product_table.product_id,
price_ranges.value
FROM transfer_product_table
JOIN price_ranges ON price_ranges.product_id = transfer_product_table.product_id
AND date <# pricerange
ORDER BY id
;
id | date | product_id | value
----+------------+------------+-------
1 | 2017-08-01 | 2 | 53
2 | 2017-10-06 | 2 | 83
3 | 2017-12-12 | 2 | 83
4 | 2018-06-25 | 5 | 53
(4 rows)
Basically, we figure out the price at any given date by using the order_product_table. We get the price between the current date (inclusive) and the next date (exclusive) with this:
daterange(date, lead(date) OVER (PARTITION BY product_id order by date), '[)') as pricerange,
Then we simply join to this on the condition that the product_ids match and that date in the transfer_product_table is contained by the pricerange.

Count previous row only if > 2 days later

In SQL Server 2012 using Studio: I need results displayed count of distinct clientnumbers (CN) for re-entry, grouped by Type like this:
Type CountOfCN
5 1
10 3
Only a RE-entry counts (ENTRY_NO 1 never counts) and it has to be more than 2 days after the end of the previous entry for that clientnumber. So basically ENTRY_NO 1 doesn't count. ENTRY_NO 2 counts if it's startdate is more than 2 days after the enddate of ENTRY_NO 1, and so on with ENTRY_NO 3, 4, 5.
I got ENTRY_NO by doing a ROW_NUMBER function when I created the table. I have no idea how to go about creating a datediff or dateadd function (?) to look at the previous row's enddate and calculate it with my startdate for each CN?
Here is my table:
CN STARTDATE ENDDATE TYPE ENTRY_NO
1 1/1/2018 1/20/2018 10 1
1 1/21/2018 1/30/2018 5 2
1 2/3/2018 NULL 10 3
2 1/1/2018 1/20/2018 10 1
2 1/27/2018 1/30/2018 10 2
3 1/1/2018 1/20/2018 5 1
3 1/27/2018 1/30/2018 10 2
3 2/10/2018 2/20/2018 5 3
4 1/7/2018 1/30/2018 5 1
5 1/27/2018 1/30/2018 5 1
5 1/31/2018 NULL 5 2
So the rows that should be in the results are ENTRY_NO 2 for CN 1, ENTRY_NO 2 for CN 2, ENTRY_NO 2 & 3 for CN 3.
Only the last Entry may/may not have a NULL enddate
Using the LAG window function you can get the previous enddate.
SELECT *
FROM
(
SELECT * ,
LAG(ENDDATE) OVER (PARTITION BY CN ORDER BY STARTDATE) AS prevEndDate
FROM yourtable
) q
WHERE DATEDIFF(d, prevEndDate, STARTDATE) > 2
AND ENDDATE IS NOT NULL
Inner join the table to itself on the conditions you want to enforce:
Can't be Entry_No 1
The Entry_No on one side is one greater than on the other side
Previous Entry must be more than 2 days earlier
Both sides of the join have the same CN
Use that join to create a CTE or derived table, and then SELECT from it, grouping by Type and getting the COUNT(*)
So this ended up being more involved than I first thought, but here it goes...
You can run this example in SSMS.
Create a table variable matching your definition above:
DECLARE #data TABLE ( CN INT, STARTDATE DATETIME, ENDDATE DATETIME, [TYPE] INT, ENTRY_NO INT );
Insert data given:
INSERT INTO #data ( CN, STARTDATE, ENDDATE, [TYPE], ENTRY_NO ) VALUES
( 1, '1/1/2018', '1/20/2018', 10, 1 )
, ( 1, '1/21/2018', '1/30/2018', 5, 2 )
, ( 1, '2/3/2018', NULL, 10, 3 )
, ( 2, '1/1/2018', '1/20/2018', 10, 1 )
, ( 2, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '1/1/2018', '1/20/2018', 5, 1 )
, ( 3, '1/27/2018', '1/30/2018', 10, 2 )
, ( 3, '2/10/2018', '2/20/2018', 5, 3 )
, ( 4, '1/7/2018', '1/30/2018', 5, 1 )
, ( 5, '1/27/2018', '1/30/2018', 5, 1 )
, ( 5, '1/31/2018', NULL, 5, 2 );
Confirm inserted data:
+----+-------------------------+-------------------------+------+----------+
| CN | STARTDATE | ENDDATE | TYPE | ENTRY_NO |
+----+-------------------------+-------------------------+------+----------+
| 1 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 1 | 2018-01-21 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 2 |
| 1 | 2018-02-03 00:00:00.000 | NULL | 10 | 3 |
| 2 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 10 | 1 |
| 2 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-01-01 00:00:00.000 | 2018-01-20 00:00:00.000 | 5 | 1 |
| 3 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 10 | 2 |
| 3 | 2018-02-10 00:00:00.000 | 2018-02-20 00:00:00.000 | 5 | 3 |
| 4 | 2018-01-07 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-27 00:00:00.000 | 2018-01-30 00:00:00.000 | 5 | 1 |
| 5 | 2018-01-31 00:00:00.000 | NULL | 5 | 2 |
+----+-------------------------+-------------------------+------+----------+
Run SQL to get type count given your business rules:
ENTRY_NO must be greater than 1
Current CN ENDDATE must be greater than 2 days from previous ENDDATE
T-SQL:
SELECT
[TYPE], COUNT( DISTINCT CN ) AS ClientCount
FROM #data
WHERE
CN IN (
SELECT DISTINCT CN FROM (
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
) AS Clients
WHERE
Clients.ENTRY_NO >= 2
AND Clients.DayDiff > 2
)
GROUP BY
[TYPE]
ORDER BY
[TYPE];
Returns:
+------+-------------+
| TYPE | ClientCount |
+------+-------------+
| 5 | 2 |
| 10 | 3 |
+------+-------------+
A quick look at the IN subquery shows us that CNs 1, 2, and 3 will be included during the "TYPE" count.
SELECT
dat.CN
, dat.ENTRY_NO
, dat.[TYPE]
, DATEDIFF( DD
, LAG( ENDDATE, 1, NULL ) OVER ( PARTITION BY CN ORDER BY CN, ENDDATE ) -- gets enddate for previous CN entry
, ENDDATE
) AS DayDiff
FROM #data dat
ORDER BY
dat.CN, dat.ENTRY_NO;
+----+----------+------+---------+
| CN | ENTRY_NO | TYPE | DayDiff |
+----+----------+------+---------+
| 1 | 1 | 10 | NULL |
| 1 | 2 | 5 | 10 |
| 1 | 3 | 10 | NULL |
| 2 | 1 | 10 | NULL |
| 2 | 2 | 10 | 10 |
| 3 | 1 | 5 | NULL |
| 3 | 2 | 10 | 10 |
| 3 | 3 | 5 | 21 |
| 4 | 1 | 5 | NULL |
| 5 | 1 | 5 | NULL |
| 5 | 2 | 5 | NULL |
+----+----------+------+---------+

Weekly Average Reports: Redshift

My Sales data for first two weeks of june, Monday Date i.e 1st Jun , 8th Jun are below
date | count
2015-06-01 03:25:53 | 1
2015-06-01 03:28:51 | 1
2015-06-01 03:49:16 | 1
2015-06-01 04:54:14 | 1
2015-06-01 08:46:15 | 1
2015-06-01 13:14:09 | 1
2015-06-01 16:20:13 | 5
2015-06-01 16:22:13 | 1
2015-06-01 16:27:07 | 1
2015-06-01 16:29:57 | 1
2015-06-01 19:16:45 | 1
2015-06-08 10:54:46 | 1
2015-06-08 15:12:10 | 1
2015-06-08 20:35:40 | 1
I need a find weekly avg of sales happened in a given range .
Complex Query:
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
count(sales_count)
from final_result_set
group by h, dow.
Output :
h | day_of_week | count
3 | 1 | 3
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 8
19 | 1 | 1
20 | 1 | 1
If I try to apply avg on the above final result, It is not actually fetching correct answer!
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
avg(sales_count)
from final_result_set
group by h, dow.
h | day_of_week | count
3 | 1 | 1
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 1
19 | 1 | 1
20 | 1 | 1
So I 've two mondays in the given range, it is not actually dividing by it. I am not even sure what is happening inside redshift.
To get "weekly averages" use date_trunc():
SELECT date_trunc('week', my_date_column) as week
, avg(sales_count) AS avg_sales
FROM final_result_set
GROUP BY 1;
I hope you are not actually using date as name for your date column. It's a reserved word in SQL and a basic type name, don't use it as identifier.
If you group by the day of week (DOW) you get averages per weekday. and sunday is 0. (Use ISODOW to get 7 for Sunday.)