Related
I'm trying to calculates the number of reports (report_user table) per user between two date (Calendar table) and by day worked (agenda_user table).
Here is the diagram of my tables:
Calendar table :
DATE Year Month
---------------------------------
2020-01-01 2020 1
2020-01-02 2020 1
2020-01-03 2020 1
2020-01-04 2020 1
AGENDA_USER table :
ID_USER DATE Value
---------------------------------
1 2020-01-01 1
2 2020-01-01 1
1 2020-01-02 0
2 2020-01-02 1
User table :
ID_USER Name
-------------------------
1 Jack
2 Robert
Report_Result table :
ID_USER Date Result
-----------------------------------
1 2020-01-01 good
1 2020-01-01 good
2 2020-01-01 bad
2 2020-01-01 good
2 2020-01-02 good
2 2020-01-02 good
Result I'm trying to find with an SQL query
ID_USER Date Number of report Day work report/work
---------------------------------------------------------------------------
1 2020-01-01 2 1 2/1 = 2
2 2020-01-01 2 1 1
1 2020-01-02 0 0 0
2 2020-01-02 2 1 2
SELECT
REPORT_USER.ID_USER,
COUNT(ID_USER) AS result
FROM [DB].[dbo].REPORT_USER AS report,
JOIN [DB].[dbo].[USER] AS [USER]
ON [USER].ID_USER = report.ID_USER
JOIN [DB].[dbo].AGENDA_USER AS agenda
ON agenda.ID_USER = report.ID_USER
WHERE CAST(agenda.[Date] AS DATE) >= '2020-09-01'
AND CAST(agenda.[Date] AS DATE) <= '2021-07-28'
AND [USER].ID_user = 1167
GROUP BY
report.ID_VENDEUR;
I'm not entirely sure I understand your problem, but I think I'm reasonably close so here is a start, point out my invalid assumptions and we can refine. More data, particularly in Agenda and Reports would really help. An explanation is below (plus see the comment in the code).
The overall flow is to generate a list of days/people you want to report on (cteUserDays), generate a list of how many reports each user generated on each date (cteReps), generate a list of who worked on what days (cteWork), and then JOIN all 3 parts together using a LEFT OUTER JOIN so the report covers all workers on all days.
EDIT: Add cteRepRaw where DATETIME is converted to DATE and "bad" reports are filtered out. Grouping and counting still happens in cteReps, but joining to cteUserDays is not there because it was adding 1 to count if there was a NULL.
DECLARE #Cal TABLE (CalDate DATETIME, CalYear int, CalMonth int)
DECLARE #Agenda TABLE (UserID int, CalDate DATE, AgendaVal int)
DECLARE #User TABLE (UserID int, UserName nvarchar(50))
DECLARE #Reps TABLE (UserID int, CalDate DATETIME, RepResult nvarchar(50))
INSERT INTO #Cal(CalDate, CalYear, CalMonth)
VALUES ('2020-01-01', 2020, 1), ('2020-01-02', 2020, 1), ('2020-01-03', 2020, 1), ('2020-01-04', 2020, 1)
INSERT INTO #Agenda(UserID, CalDate, AgendaVal)
VALUES (1, '2020-01-01', 1), (2, '2020-01-01', 1), (1, '2020-01-02', 0), (2, '2020-01-02', 1)
INSERT INTO #User (UserID , UserName )
VALUES (1, 'Jack'), (2, 'Robert')
INSERT INTO #Reps (UserID , CalDate , RepResult )
VALUES (1, '2020-01-01', 'good'), (1, '2020-01-01', 'good')
, (2, '2020-01-01', 'bad'), (2, '2020-01-01', 'good')
, (2, '2020-01-02', 'good'), (2, '2020-01-02', 'good')
; with cteUserDays as (
--First, you want zeros in your table where no reports are, so build a table for that
SELECT CONVERT(DATE, D.CalDate) as CalDate --EDIT add CONVERT here
, U.UserID FROM #Cal as D CROSS JOIN #User as U
WHERE D.CalDate >= '2020-01-01' AND D.CalDate <= '2021-07-28'
--EDIT Watch the <= date here, a DATE is < DATETIME with hours of the same day
), cteRepRaw as (--EDIT Add this CTE to convert DATETIME to DATE so we can group on it
--Map the DateTime to a DATE type so we can group reports from any time of day
SELECT R.UserID
, CONVERT(DATE, R.CalDate) as CalDate --EDIT add CONVERT here
, R.RepResult
FROM #Reps as R
WHERE R.RepResult='good' --EDIT Add this test to only count good ones
), cteReps as (
--Get the sum of all reports for a given user on a given day, though some might be missing (fill 0)
SELECT R.UserID , R.CalDate , COUNT(*) as Reports --SUM(COALESCE(R.RepResult, 0)) as Reports
FROM cteRepRaw as R--cteUserDays as D
--Some days may have no reports for that worker, so use a LEFT OUTER JOIN
--LEFT OUTER JOIN cteRepRaw as R on D.CalDate = R.CalDate AND D.UserID = R.UserID
GROUP BY R.UserID , R.CalDate
) , cteWork as (
--Unclear what values "value" in Agenda can take, but assuming it's some kind of work
-- unit, like "hours worked" or "shifts" so add them up
SELECT A.UserID , A.CalDate, SUM(A.AgendaVal) as DayWork FROM #Agenda as A
WHERE A.CalDate >= '2020-01-01' AND A.CalDate <= '2021-07-28'
GROUP BY A.CalDate, A.UserID
)
SELECT D.UserID , D.CalDate, COALESCE(R.Reports, 0) as Reports, W.DayWork
--NOTE: While it's probably a mistake to credit a report to a day a worker had
--no shifts, it could happen and would throw an error so check
, CASE WHEN W.DayWork > 0 THEN R.Reports / W.DayWork ELSE 0 END as RepPerWork
FROM cteUserDays as D
LEFT OUTER JOIN cteReps as R on D.CalDate=R.CalDate AND R.UserID = D.UserID
LEFT OUTER JOIN cteWork as W on D.UserID = W.UserID AND D.CalDate = W.CalDate
ORDER BY CalDate , UserID
First, as per the comments in your OP "Agenda" represents when the user is working, you don't say how it's structured so I'll assume it can have multiple entries for a given person on a given day (i.e. a 4 hour shift and an 8 hour shift) so I'll add them up to get total work (cteWork). I also assume that if somebody didn't work, you can't have a report for them. I check for this, but normally I'd expect your data validator to pre-screen those out.
Second, I'll assume reports are 1 per record, and a given user can have multiple per day. You have that in your given, but it's important to this solution so I'm restating in case somebody else reads this later.
Third, I assume you want all days reported for all users, I assure this by generating a CROSS join between users and days (cteUserDays)
I browsed SO but could not quite find the exact answer or maybe it was for a different language.
Let's say I have a table, where each row is a record of a trade:
trade_id customer trade_date
1 A 2013-05-01 00:00:00
2 B 2013-05-01 10:00:00
3 A 2013-05-02 00:00:00
4 A 2013-05-05 00:00:00
5 B 2013-05-06 12:00:00
I would like to have the average time between trades, in days or fraction of days, for each customer, and the number of days since last trade. So for instance for customer A, time between trades 1 and 3 is 1 day and between trades 3 and 4 is 3 days, for an average of 2. So the end table would look like something like this (assuming today it's the 2013-05-10):
customer avg_time_btw_trades time_since_last_trade
A 2.0 5.0
B 5.08 3.5
If a customer has only got 1 trade I guess NULL is fine as output.
Not even sure SQL is the best way to do this (I am working with SQL server), but any help is appreciated!
SELECT
customer,
DATEDIFF(second, MIN(trade_date), MAX(trade_date)) / (NULLIF(COUNT(*), 1) - 1) / 86400.0,
DATEDIFF(second, MAX(trade_date), GETDATE() ) / 86400.0
FROM
yourTable
GROUP BY
customer
http://sqlfiddle.com/#!6/eb46e/7
EDIT: Added final field that I didn't notice, apologies.
The following SQL script uses your data and gives the expected results.
DECLARE #temp TABLE
( trade_id INT,
customer CHAR(1),
trade_date DATETIME );
INSERT INTO #temp VALUES (1, 'A', '20130501');
INSERT INTO #temp VALUES (2, 'B', '20130501 10:00');
INSERT INTO #temp VALUES (3, 'A', '20130502');
INSERT INTO #temp VALUES (4, 'A', '20130505');
INSERT INTO #temp VALUES (5, 'B', '20130506 12:00');
DECLARE #getdate DATETIME
-- SET #getdate = getdate();
SET #getdate = '20130510';
SELECT s.customer
, AVG(s.days_btw_trades) AS avg_time_between_trades
, CAST(DATEDIFF(hour, MAX(s.trade_date), #getdate) AS float)
/ 24.0 AS time_since_last_trade
FROM (
SELECT CAST(DATEDIFF(HOUR, t2.trade_date, t.trade_date) AS float)
/ 24.0 AS days_btw_trades
, t.customer
, t.trade_date
FROM #temp t
LEFT JOIN #temp t2 ON t2.customer = t.customer
AND t2.trade_date = ( SELECT MAX(t3.trade_date)
FROM #temp t3
WHERE t3.customer = t.customer
AND t3.trade_date < t.trade_date)
) s
GROUP BY s.customer
You need a date difference between every trade and average them.
select
a.customer
,avg(datediff(a.trade_date, b.trade_date))
,datediff(now(),max(a.trade_date))
from yourTable a, yourTable b
where a.customer = b.customer
and b.trade_date = (
select max(trade_date)
from yourTable c
where c.customer = a.customer
and a.trade_date > c.trade_date)
#gets the one earlier date for every trade
group by a.customer
Just for grins I added a solution that would use CTE's. You could probably use a temp table if the first query is too large. I used #MatBailie creation script for the table:
CREATE TABLE customer_trades (
id INT IDENTITY(1,1),
customer_id INT,
trade_date DATETIME,
PRIMARY KEY (id),
INDEX ix_user_trades (customer_id, trade_date)
)
INSERT INTO
customer_trades (
customer_id,
trade_date
)
VALUES
(1, '2013-05-01 00:00:00'),
(2, '2013-05-01 10:00:00'),
(1, '2013-05-02 00:00:00'),
(1, '2013-05-05 00:00:00'),
(2, '2013-05-06 12:00:00')
;
;WITH CTE as(
select customer_id, trade_date, datediff(hour,trade_date,ISNULL(LEAD(trade_date,1) over (partition by customer_id order by trade_date),GETDATE())) Trade_diff
from customer_trades
)
, CTE2 as
(SELECT customer_id, trade_diff, LAST_VALUE(trade_diff) OVER(Partition by customer_id order by trade_date) Curr_Trade from CTE)
SELECT Customer_id, AVG(trade_diff) AV, Max(Curr_Trade) Curr_Trade
FROM CTE2
GROUP BY customer_id
I have to make daily report from an SQL table which contains ID (user id), Timestamp, Balance Transactions.
My quest: Every transactions have been stored in the table. I have to know the summary all user's balance on every day.
For examle:
27/06/2016 8:10 User1 50$
27/06/2016 10:22 User1 75$
27/06/2016 11:32 User2 10$
28/06/2016 09:22 User3 40$
28/06/2016 17:35 User1 22$
In this case the results have to be the following:
27/06/2016: 85$ (75+10) because last user1's balance 75 and user2 10
28/06/2016: 72$ (22+10+40) because last user1's balance 22 and user2 10 (it was modified on yesterday but I have to count it!!!) and user3 22$
Please help.
Thanks
My solution but it is not correct: Only provide result if transaction was on the day and does not add previous day result.
Here are the requests I tried so far :
Request 1 :
USE DB1;
GO
WITH cte (bin, currency, id, currentbalance, currentledgerbalance, dt) as ( SELECT bin, w.currency, w.id,t.currentbalance, t.currentledgerbalance, t.dt
FROM [DB1].[Tb1] b
inner join [tb2] c
on b.[id]=c.[id]
inner join tb3 w
on c.id=w.id and w.currency=b.currency
inner join [DB1].[tb4] t
on t.walletid=w.id )
, CTE2 (bin,currency,id,currentbalance,currentledgerbalance,dt) as (
select *
from cte
where dt in (select MAX(dt) FROM cte GROUP BY currency,id,DAY(dt), MONTH(dt), YEAR(dt))
)
Request 2 :
select currency
,cast(dt as date) as stat_day
,sum(currentbalance) as currentbalance
from CTE2
GROUP BY currency,cast(dt as date)
order by stat_day go
I am not sure what other tables involved in the solution. Just giving you a generic solution based on the query given:
---Creating a test table
create table usertrans (tid int identity, tdate date, uname varchar(30),balance int);
insert into usertrans values ('06/27/2017','user1',50);
insert into usertrans values ('06/27/2017','user1',75);
insert into usertrans values ('06/27/2017','user2',10);
insert into usertrans values ('06/28/2017','user3',40);
insert into usertrans values ('06/28/2017','user1',22);
select * from usertrans
-- Retrieving (2017-06-28) balance
with UMaxTrans(UName,TID)
AS(
select uname, max(tid) AS TID from usertrans
WHERE TDate < = '2017-06-28'
group by uname)
select CAST(GETDATE() AS DATE) AS 'Today' , sum(Balance) FROM UserTrans UT
INNER JOIN UMaxTrans UMT ON UT.TID = UMT.TID;
-- Retrieving (2017-06-27) balance
with UMaxTrans(UName,TID)
AS(
select uname, max(tid) AS TID from usertrans
WHERE TDate < = '2017-06-27'
group by uname)
select CAST(GETDATE() AS DATE) AS 'Today' , sum(Balance) FROM UserTrans UT
INNER JOIN UMaxTrans UMT ON UT.TID = UMT.TID;
Logic:
We have users and the balance might get change any number of times in a given day but while calculating the total balance from all users we have to consider the latest transaction from the user. Thats what we are doing in the query. We are getting the maximum transaction ID for a given user and thats the latest balance.
I want to get all times that an event is not taking place for each room. The start of the day is 9:00:00 and end is 22:00:00.
What my database looks like is this:
Event EventStart EventEnd Days Rooms DayStarts
CISC 3660 09:00:00 12:30:00 Monday 7-3 9/19/2014
MATH 2501 15:00:00 17:00:00 Monday:Wednesday 7-2 10/13/2014
CISC 1110 14:00:00 16:00:00 Monday 7-3 9/19/2014
I want to get the times that aren't in the database.
ex. For SelectedDate (9/19/2014) the table should return:
Room FreeTimeStart FreeTimeEnd
7-3 12:30:00 14:00:00
7-3 16:00:00 22:00:00
ex2. SelectedDate (10/13/2014):
Room FreeTimeStart FreeTimeEnd
7-2 9:00:00 15:00:00
7-2 17:00:00 22:00:00
What I have tried is something like this:
select * from Events where ________ NOT BETWEEN eventstart AND eventend;
But I do not know what to put in the place of the space.
This was a pretty complex request. SQL works best with sets, and not looking at line by line. Here is what I came up with. To make it easier to figure out, I wrote it as a series of CTE's so I could work through the problem a step at a time. I am not saying that this is the best possible way to do it, but it doesn't require the use of any cursors. You need the Events table and a table of the room names (otherwise, you don't see a room that doesn't have any bookings).
Here is the query and I will explain the methodology.
DECLARE #Events TABLE (Event varchar(20), EventStart Time, EventEnd Time, Days varchar(50), Rooms varchar(10), DayStarts date)
INSERT INTO #Events
SELECT 'CISC 3660', '09:00:00', '12:30:00', 'Monday', '7-3', '9/19/2014' UNION
SELECT 'MATH 2501', '15:00:00', '17:00:00', 'Monday:Wednesday', '7-2', '10/13/2014' UNION
SELECT 'CISC 1110', '14:00:00', '16:00:00', 'Monday', '7-3', '9/19/2014'
DECLARE #Rooms TABLE (RoomName varchar(10))
INSERT INTO #Rooms
SELECT '7-2' UNION
SELECT '7-3'
DECLARE #SelectedDate date = '9/19/2014'
DECLARE #MinTimeInterval int = 30 --smallest time unit room can be reserved for
;WITH
D1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
D2(N) AS (SELECT 1 FROM D1 a, D1 b),
D4(N) AS (SELECT 1 FROM D2 a, D2 b),
Numbers AS (SELECT TOP 3600 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS Number FROM D4),
AllTimes AS
(SELECT CAST(DATEADD(n,Numbers.Number*#MinTimeInterval,'09:00:00') as time) AS m FROM Numbers
WHERE DATEADD(n,Numbers.Number*#MinTimeInterval,'09:00:00') <= '22:00:00'),
OccupiedTimes AS (
SELECT e.Rooms, ValidTimes.m
FROM #Events E
CROSS APPLY (SELECT m FROM AllTimes WHERE m BETWEEN CASE WHEN e.EventStart = '09:00:00' THEN e.EventStart ELSE DATEADD(n,1,e.EventStart) END and CASE WHEN e.EventEnd = '22:00:00' THEN e.EventEnd ELSE DATEADD(n,-1,e.EventEnd) END) ValidTimes
WHERE e.DayStarts = #SelectedDate
),
AllRoomsAllTimes AS (
SELECT * FROM #Rooms R CROSS JOIN AllTimes
), AllOpenTimes AS (
SELECT a.*, ROW_NUMBER() OVER( PARTITION BY (a.RoomName) ORDER BY a.m) AS pos
FROM AllRoomsAllTimes A
LEFT OUTER JOIN OccupiedTimes o ON a.RoomName = o.Rooms AND a.m = o.m
WHERE o.m IS NULL
), Finalize AS (
SELECT a1.RoomName,
CASE WHEN a3.m IS NULL OR DATEDIFF(n,a3.m, a1.m) > #MinTimeInterval THEN a1.m else NULL END AS FreeTimeStart,
CASE WHEN a2.m IS NULL OR DATEDIFF(n,a1.m,a2.m) > #MinTimeInterval THEN A1.m ELSE NULL END AS FreeTimeEnd,
ROW_NUMBER() OVER( ORDER BY a1.RoomName ) AS Pos
FROM AllOpenTimes A1
LEFT OUTER JOIN AllOpenTimes A2 ON a1.RoomName = a2.RoomName and a1.pos = a2.pos-1
LEFT OUTER JOIN AllOpenTimes A3 ON a1.RoomName = a3.RoomName and a1.pos = a3.pos+1
WHERE A2.m IS NULL OR DATEDIFF(n,a1.m,a2.m) > #MinTimeInterval
OR
A3.m IS NULL OR DATEDIFF(n,a3.m, a1.m) > #MinTimeInterval
)
SELECT F1.RoomName, f1.FreeTimeStart, f2.FreeTimeEnd FROM Finalize F1
LEFT OUTER JOIN Finalize F2 ON F1.Pos = F2.pos-1 AND f1.RoomName = f2.RoomName
WHERE f1.pos % 2 = 1
In the first several lines, I create temp variables to simulate your tables Events and Rooms.
The variable #MinTimeInterval determines what time interval the room schedules can be on (every 30 min, 15 min, etc - this number needs to divide evenly into 60).
Since SQL cannot query data that is missing, we need to create a table that holds all of the times that we want to check for. The first several lines in the WITH create a table called AllTimes which are all the possible time intervals in your day.
Next, we get a list of all of the times that are occupied (OccupiedTimes), and then LEFT OUTER JOIN this table to the AllTimes table which gives us all the available times. Since we only want the start and end of each free time, create the Finalize table which self joins each record to the previous and next record in the table. If the times in these rows are greater than #MinTimeInterval, then we know it is either a start or end of a free time.
Finally we self join this last table to put the start and end times in the same row and only look at every other row.
This will need to be adjusted if a single row in Events spans multiple days or multiple rooms.
Here's a solution that will return the "complete picture" including rooms that aren't booked at all for the day in question:
Declare #Date char(8) = '20141013'
;
WITH cte as
(
SELECT *
FROM -- use your table name instead of the VALUES construct
(VALUES
('09:00:00','12:30:00' ,'7-3', '20140919'),
('15:00:00','17:00:00' ,'7-2', '20141013'),
('14:00:00','16:00:00' ,'7-3', '20140919')) x(EventStart , EventEnd,Rooms, DayStarts)
), cte_Days_Rooms AS
-- get a cartesian product for the day specified and all rooms as well as the start and end time to compare against
(
SELECT y.EventStart,y.EventEnd, x.rooms,a.DayStarts FROM
(SELECT #Date DayStarts) a
CROSS JOIN
(SELECT DISTINCT Rooms FROM cte)x
CROSS JOIN
(SELECT '09:00:00' EventStart,'09:00:00' EventEnd UNION ALL
SELECT '22:00:00' EventStart,'22:00:00' EventEnd) y
), cte_1 AS
-- Merge the original data an the "base data"
(
SELECT * FROM cte WHERE DayStarts=#Date
UNION ALL
SELECT * FROM cte_Days_Rooms
), cte_2 as
-- use the ROW_NUMBER() approach to sort the data
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY DayStarts, Rooms ORDER BY EventStart) as pos
FROM cte_1
)
-- final query: self join with an offest of one row, eliminating duplicate rows if a room is booked starting 9:00 or ending 22:00
SELECT c2a.DayStarts, c2a.Rooms , c2a.EventEnd, c2b.EventStart
FROM cte_2 c2a
INNER JOIN cte_2 c2b on c2a.DayStarts = c2b.DayStarts AND c2a.Rooms =c2b.Rooms AND c2a.pos = c2b.pos -1
WHERE c2a.EventEnd <> c2b.EventStart
ORDER BY c2a.DayStarts, c2a.Rooms
How do you create a moving average in SQL?
Current table:
Date Clicks
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520
2012-05-04 1,330
2012-05-05 2,260
2012-05-06 3,540
2012-05-07 2,330
Desired table or output:
Date Clicks 3 day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010
This is an Evergreen Joe Celko question.
I ignore which DBMS platform is used. But in any case Joe was able to answer more than 10 years ago with standard SQL.
Joe Celko SQL Puzzles and Answers citation:
"That last update attempt suggests that we could use the predicate to
construct a query that would give us a moving average:"
SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;
Is the extra column or the query approach better? The query is
technically better because the UPDATE approach will denormalize the
database. However, if the historical data being recorded is not going
to change and computing the moving average is expensive, you might
consider using the column approach.
MS SQL Example:
CREATE TABLE #TestDW
( Date1 datetime,
LoadValue Numeric(13,6)
);
INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );
SQL Puzzle query:
SELECT S1.date1, AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
BETWEEN DATEADD(d, -2, S1.date1 )
AND S1.date1
GROUP BY S1.date1
order by 1;
One way to do this is to join on the same table a few times.
select
(Current.Clicks
+ isnull(P1.Clicks, 0)
+ isnull(P2.Clicks, 0)
+ isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
MyTable as Current
left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)
Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.
This works nicely for situations where you need a moving average over only a few data points.
This is not an optimal solution for moving averages with more than a few data points.
select t2.date, round(sum(ct.clicks)/3) as avg_clicks
from
(select date from clickstable) as t2,
(select date, clicks from clickstable) as ct
where datediff(t2.date, ct.date) between 0 and 2
group by t2.date
Example here.
Obviously you can change the interval to whatever you need. You could also use count() instead of a magic number to make it easier to change, but that will also slow it down.
General template for rolling averages that scales well for large data sets
WITH moving_avg AS (
SELECT 0 AS [lag] UNION ALL
SELECT 1 AS [lag] UNION ALL
SELECT 2 AS [lag] UNION ALL
SELECT 3 AS [lag] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1]) AS [avg_value1],
AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
And for weighted rolling averages:
WITH weighted_avg AS (
SELECT 0 AS [lag], 1.0 AS [weight] UNION ALL
SELECT 1 AS [lag], 0.6 AS [weight] UNION ALL
SELECT 2 AS [lag], 0.3 AS [weight] UNION ALL
SELECT 3 AS [lag], 0.1 AS [weight] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1] * [weight]) / AVG([weight]) AS [wavg_value1],
AVG([value2] * [weight]) / AVG([weight]) AS [wavg_value2]
FROM [data_table]
CROSS JOIN weighted_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
select *
, (select avg(c2.clicks) from #clicks_table c2
where c2.date between dateadd(dd, -2, c1.date) and c1.date) mov_avg
from #clicks_table c1
Use a different join predicate:
SELECT current.date
,avg(periods.clicks)
FROM current left outer join current as periods
ON current.date BETWEEN dateadd(d,-2, periods.date) AND periods.date
GROUP BY current.date HAVING COUNT(*) >= 3
The having statement will prevent any dates without at least N values from being returned.
assume x is the value to be averaged and xDate is the date value:
SELECT avg(x) from myTable WHERE xDate BETWEEN dateadd(d, -2, xDate) and xDate
In hive, maybe you could try
select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;
For the purpose, I'd like to create an auxiliary/dimensional date table like
create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)
while date is the key, date_1 for this day, date_2 contains this day and the day before; date_3...
Then you can do the equal join in hive.
Using a view like:
select date, date from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim
NOTE: THIS IS NOT AN ANSWER but an enhanced code sample of Diego Scaravaggi's answer. I am posting it as answer as the comment section is insufficient. Note that I have parameter-ized the period for Moving aveage.
declare #p int = 3
declare #t table(d int, bal float)
insert into #t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)
select a.d, avg(b.bal)
from
#t a
left join #t b on b.d between a.d-(#p-1) and a.d
group by a.d
--#p1 is period of moving average, #01 is offset
declare #p1 as int
declare #o1 as int
set #p1 = 5;
set #o1 = 3;
with np as(
select *, rank() over(partition by cmdty, tenor order by markdt) as r
from p_prices p1
where
1=1
)
, x1 as (
select s1.*, avg(s2.val) as avgval from np s1
inner join np s2
on s1.cmdty = s2.cmdty and s1.tenor = s2.tenor
and s2.r between s1.r - (#p1 - 1) - (#o1) and s1.r - (#o1)
group by s1.cmdty, s1.tenor, s1.markdt, s1.val, s1.r
)
I'm not sure that your expected result (output) shows classic "simple moving (rolling) average" for 3 days. Because, for example, the first triple of numbers by definition gives:
ThreeDaysMovingAverage = (2.230 + 3.150 + 5.520) / 3 = 3.6333333
but you expect 4.360 and it's confusing.
Nevertheless, I suggest the following solution, which uses window-function AVG. This approach is much more efficient (clear and less resource-intensive) than SELF-JOIN introduced in other answers (and I'm surprised that no one has given a better solution).
-- Oracle-SQL dialect
with
data_table as (
select date '2012-05-01' AS dt, 2.230 AS clicks from dual union all
select date '2012-05-02' AS dt, 3.150 AS clicks from dual union all
select date '2012-05-03' AS dt, 5.520 AS clicks from dual union all
select date '2012-05-04' AS dt, 1.330 AS clicks from dual union all
select date '2012-05-05' AS dt, 2.260 AS clicks from dual union all
select date '2012-05-06' AS dt, 3.540 AS clicks from dual union all
select date '2012-05-07' AS dt, 2.330 AS clicks from dual
),
param as (select 3 days from dual)
select
dt AS "Date",
clicks AS "Clicks",
case when rownum >= p.days then
avg(clicks) over (order by dt
rows between p.days - 1 preceding and current row)
end
AS "3 day Moving Average"
from data_table t, param p;
You see that AVG is wrapped with case when rownum >= p.days then to force NULLs in first rows, where "3 day Moving Average" is meaningless.
We can apply Joe Celko's "dirty" left outer join method (as cited above by Diego Scaravaggi) to answer the question as it was asked.
declare #ClicksTable table ([Date] date, Clicks int)
insert into #ClicksTable
select '2012-05-01', 2230 union all
select '2012-05-02', 3150 union all
select '2012-05-03', 5520 union all
select '2012-05-04', 1330 union all
select '2012-05-05', 2260 union all
select '2012-05-06', 3540 union all
select '2012-05-07', 2330
This query:
SELECT
T1.[Date],
T1.Clicks,
-- AVG ignores NULL values so we have to explicitly NULLify
-- the days when we don't have a full 3-day sample
CASE WHEN count(T2.[Date]) < 3 THEN NULL
ELSE AVG(T2.Clicks)
END AS [3-Day Moving Average]
FROM #ClicksTable T1
LEFT OUTER JOIN #ClicksTable T2
ON T2.[Date] BETWEEN DATEADD(d, -2, T1.[Date]) AND T1.[Date]
GROUP BY T1.[Date]
Generates the requested output:
Date Clicks 3-Day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010