Related
I'm using MS Report Builder 3.0 / SQL Server 2012 and I have a database of "ticket" records. Each ticket has a status (simplified to open / closed), an origination date, and a completion date. I've been asked to build a cross-tab report that returns the number of open records as of the last day of the month for the last 12 months.
I could easily provide a report that shows open items NOW. I can also fairly easily calculate the number of open items on any given date (origination date <= #DATE, comp date > #date or comp date is null). Using that logic, I could even define a dataset for each of the 12 periods for the given scope, but since each of those periods would be defined explicitly, they wouldn't be in the same field to use as the column group for the cross tab, so I don't know how I would actually be able to construct a single crosstab table that would summarize those results.
Anyone ever done anything like this and can share their method?
My most recent thought is to select each date period explicitly and combine them using unions and then use that as the basis for the report, but I'm having a tough time forcing my brain to congeal that concept into something I can execute.
Wouldn't something like this work? This query takes all the cases with status 'open' from last year then groups them by months.
select EOMONTH(completionDate), count(*)
from data
where ticketStatus = 'open' and completionDate > EOMONTH(DateAdd(Year, -1, GETDATE())
group by EOMONTH(completionDate)
You can play with the completionDate > EOMONTH(DateAdd(Year, -1, GETDATE()) condition depending whtat exactly do you need.
Creating the query from the perspective of the time period was the solution.
SELECT
PERIOD_START
,COUNT (ACOUNT.ACTIVITY_ID) OPEN_ACT_COUNT
FROM
(
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-12,getdate())),datepart(mm,dateadd(mm,-12,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-12,getdate())),datepart(mm,dateadd(mm,-12,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-11,getdate())),datepart(mm,dateadd(mm,-11,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-11,getdate())),datepart(mm,dateadd(mm,-11,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-10,getdate())),datepart(mm,dateadd(mm,-10,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-10,getdate())),datepart(mm,dateadd(mm,-10,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-9,getdate())),datepart(mm,dateadd(mm,-9,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-9,getdate())),datepart(mm,dateadd(mm,-9,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-8,getdate())),datepart(mm,dateadd(mm,-8,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-8,getdate())),datepart(mm,dateadd(mm,-8,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-7,getdate())),datepart(mm,dateadd(mm,-7,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-7,getdate())),datepart(mm,dateadd(mm,-7,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-6,getdate())),datepart(mm,dateadd(mm,-6,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-6,getdate())),datepart(mm,dateadd(mm,-6,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-5,getdate())),datepart(mm,dateadd(mm,-5,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-5,getdate())),datepart(mm,dateadd(mm,-5,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-4,getdate())),datepart(mm,dateadd(mm,-4,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-4,getdate())),datepart(mm,dateadd(mm,-4,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-3,getdate())),datepart(mm,dateadd(mm,-3,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-3,getdate())),datepart(mm,dateadd(mm,-3,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-2,getdate())),datepart(mm,dateadd(mm,-2,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-2,getdate())),datepart(mm,dateadd(mm,-2,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(datepart(yy,dateadd(mm,-1,getdate())),datepart(mm,dateadd(mm,-1,getdate())),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(datepart(yy,dateadd(mm,-1,getdate())),datepart(mm,dateadd(mm,-1,getdate())),'01')) AS PERIOD_END
UNION
SELECT
DATEFROMPARTS(DATEPART(yy,getdate()),datepart(mm,getdate()),'01') AS PERIOD_START
,EOMONTH(DATEFROMPARTS(DATEPART(yy,getdate()),datepart(mm,getdate()),'01')) AS PERIOD_END
) PERIODS
LEFT JOIN (
SELECT
AROOT.ACTIVITY_ID
,DACT_PARTY.ACT_OWNER_DEPT_GRP
,DACT_ORI.ACT_ORI_DATE
,DACT_COM.ACT_COM_DATE
,CASE WHEN AROOT.LCYCLE_CD IN ('01','02','03','04','60','63') THEN 'OPEN'
WHEN AROOT.LCYCLE_CD IN ('06','07','09') THEN 'COMPLETE' END STATUS
FROM
DM_IAM_D_ACT_ROOT AROOT
JOIN DM_IAM_D_I_ROOT IROOT ON IROOT.DB_KEY = AROOT.PAR_ISSUE_UUID AND IROOT.APPLICATION = 'QIM' AND IROOT.LCYCLE_CD NOT IN ('10', '64') AND IROOT.ZZCAP_FACILITY = CASE WHEN #FAC = 'MT' THEN 'N200' WHEN #FAC = 'PI' THEN 'N202' WHEN #FAC = 'HU' THEN 'N204' ELSE #FAC END
OUTER APPLY (
SELECT
DM_IAM_D_ACT_PARTY.PARENT_KEY
,DM_BUT000.FIRST_LAST_NAME
,NPDA.FLEET_ID
,NPDA.DEPARTMENT
,DEPTS.ZZCAP_OWED_TO_SUB_DEPT_DESC AS DEPT_DESC
,DEPTS.ZZCAP_OWED_DEPT AS ACT_OWNER_DEPT_GRP
,NPDA.SUPERVISOR
FROM
DM_IAM_D_ACT_PARTY
JOIN DM_BUT000 ON DM_IAM_D_ACT_PARTY.PARTNER_ID = DM_BUT000.PARTNER
JOIN DM_ADR6 ON DM_BUT000.PERSNUMBER = DM_ADR6.PERSNUMBER
JOIN NMC_PERSONNEL_DATA NPDA ON LEFT(DM_ADR6.SMTP_ADDR,40) = LEFT(NPDA.WORK_EMAIL,40)
JOIN DM_ZCAP_OWED_DEPT DEPTS ON NPDA.DEPARTMENT = DEPTS.ZZCAP_OWED_TO_SUB_DEPT
WHERE
AROOT.DB_KEY = DM_IAM_D_ACT_PARTY.PARENT_KEY AND AROOT.MANDT = DM_IAM_D_ACT_PARTY.MANDT
AND DM_IAM_D_ACT_PARTY.PARTY_ROLE_CODE IN ('ACTDRIVR', 'ZASSIGN', 'ZACTAPP')
) AS DACT_PARTY
LEFT JOIN (
SELECT
DM_IAM_D_ACT_DATE.PARENT_KEY
,DATEFROMPARTS(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,4),RIGHT(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,6),2),RIGHT(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,8),2)) AS ACT_ORI_DATE
FROM
DM_IAM_D_ACT_DATE
WHERE
DM_IAM_D_ACT_DATE.ROLE_CD = 'ORI'
AND DM_IAM_D_ACT_DATE.DATE_TIME > 19000000000000
) DACT_ORI ON AROOT.DB_KEY = DACT_ORI.PARENT_KEY
LEFT JOIN (
SELECT
DM_IAM_D_ACT_DATE.PARENT_KEY
,DATEFROMPARTS(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,4),RIGHT(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,6),2),RIGHT(LEFT(DM_IAM_D_ACT_DATE.DATE_TIME,8),2)) AS ACT_COM_DATE
FROM
DM_IAM_D_ACT_DATE
WHERE
DM_IAM_D_ACT_DATE.ROLE_CD = 'COM'
AND DM_IAM_D_ACT_DATE.DATE_TIME > 19000000000000
) DACT_COM ON AROOT.DB_KEY = DACT_COM.PARENT_KEY
WHERE
AROOT.ACT_TEMPLATE IN (' ', 'CA', 'CAPR', 'CCA', 'FBC-TEMP', 'FBD-TEMP', 'MRA', 'OBD-TEMP', 'OBN_OBD_CO', 'OBN_OBD_FB', 'OBN-TEMP', 'IA')
AND AROOT.LCYCLE_CD NOT IN ('10','64')
AND AROOT.LONG_TERM <> 'X'
) ACOUNT ON ACOUNT.ACT_ORI_DATE <= PERIOD_END AND (ACOUNT.STATUS = 'OPEN' OR ACOUNT.ACT_COM_DATE >= PERIOD_END)
GROUP BY PERIOD_START
I have a table that has aggregations down to the hour level YYYYMMDDHH. The data is aggregated and loaded by an external process (I don't have control over). I want to test the data on a monthly basis.
The question I am looking to answer is: Does every hour in the month exist?
I'm looking to produce output that will return a 1 if the hour exists or 0 if the hour does not exist.
The aggregation table looks something like this...
YYYYMM YYYYMMDD YYYYMMDDHH DATA_AGG
201911 20191101 2019110100 100
201911 20191101 2019110101 125
201911 20191101 2019110103 135
201911 20191101 2019110105 95
… … … …
201911 20191130 2019113020 100
201911 20191130 2019113021 110
201911 20191130 2019113022 125
201911 20191130 2019113023 135
And defined as...
CREATE TABLE YYYYMMDDHH_DATA_AGG AS (
YYYYMM VARCHAR,
YYYYMMDD VARCHAR,
YYYYMMDDHH VARCHAR,
DATA_AGG INT
);
I'm looking to produce the following below...
YYYYMMDDHH HOUR_EXISTS
2019110100 1
2019110101 1
2019110102 0
2019110103 1
2019110104 0
2019110105 1
... ...
In the example above, two hours do not exist, 2019110102 and 2019110104.
I assume I'd have to join the aggregation table against a computed table that contains all the YYYYMMDDHH combos???
The database is Snowflake, but assume most generic ANSI SQL queries will work.
You can get what you want with a recursive CTE
The recursive CTE generates the list of possible Hours. And then a simple left outer join gets you the flag for if you have any records that match that hour.
WITH RECURSIVE CTE (YYYYMMDDHH) as
(
SELECT YYYYMMDDHH
FROM YYYYMMDDHH_DATA_AGG
WHERE YYYYMMDDHH = (SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
UNION ALL
SELECT TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') YYYYMMDDHH
FROM CTE C
WHERE TO_VARCHAR(DATEADD(HOUR, 1, TO_TIMESTAMP(C.YYYYMMDDHH, 'YYYYMMDDHH')), 'YYYYMMDDHH') <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG)
)
SELECT
C.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM CTE C
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON C.YYYYMMDDHH = A.YYYYMMDDHH;
If your timerange is too long you'll have issues with the cte recursing too much. You can create a table or temp table with all of the possible hours instead. For example:
CREATE OR REPLACE TEMPORARY TABLE HOURS (YYYYMMDDHH VARCHAR) AS
SELECT TO_VARCHAR(DATEADD(HOUR, SEQ4(), TO_TIMESTAMP((SELECT MIN(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG), 'YYYYMMDDHH')), 'YYYYMMDDHH')
FROM TABLE(GENERATOR(ROWCOUNT => 10000)) V
ORDER BY 1;
SELECT
H.YYYYMMDDHH,
IFF(A.YYYYMMDDHH IS NOT NULL, 1, 0) HOUR_EXISTS
FROM HOURS H
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG A
ON H.YYYYMMDDHH = A.YYYYMMDDHH
WHERE H.YYYYMMDDHH <= (SELECT MAX(YYYYMMDDHH) FROM YYYYMMDDHH_DATA_AGG);
You can then fiddle with the generator count to make sure you have enough hours.
You can generate a table with every hour of the month and LEFT OUTER JOIN your aggregation to it:
WITH EVERY_HOUR AS (
SELECT TO_CHAR(DATEADD(HOUR, HH, TO_DATE(YYYYMM::TEXT, 'YYYYMM')),
'YYYYMMDDHH')::NUMBER YYYYMMDDHH
FROM (SELECT DISTINCT YYYYMM FROM YYYYMMDDHH_DATA_AGG) t
CROSS JOIN (
SELECT ROW_NUMBER() OVER (ORDER BY NULL) - 1 HH
FROM TABLE(GENERATOR(ROWCOUNT => 745))
) h
QUALIFY YYYYMMDDHH < (YYYYMM + 1) * 10000
)
SELECT h.YYYYMMDDHH, NVL2(a.YYYYMM, 1, 0) HOUR_EXISTS
FROM EVERY_HOUR h
LEFT OUTER JOIN YYYYMMDDHH_DATA_AGG a ON a.YYYYMMDDHH = h.YYYYMMDDHH
Here's something that might help get you started. I'm guessing you want to have 'synthetic' [YYYYMMDD] values? Otherwise, if the value aren't there, then they shouldn't appear in the list
DROP TABLE IF EXISTS #_hours
DROP TABLE IF EXISTS #_temp
--Populate a table with hours ranging from 00 to 23
CREATE TABLE #_hours ([hour_value] VARCHAR(2))
DECLARE #_i INT = 0
WHILE (#_i < 24)
BEGIN
INSERT INTO #_hours
SELECT FORMAT(#_i, '0#')
SET #_i += 1
END
-- Replicate OP's sample data set
CREATE TABLE #_temp (
[YYYYMM] INTEGER
, [YYYYMMDD] INTEGER
, [YYYYMMDDHH] INTEGER
, [DATA_AGG] INTEGER
)
INSERT INTO #_temp
VALUES
(201911, 20191101, 2019110100, 100),
(201911, 20191101, 2019110101, 125),
(201911, 20191101, 2019110103, 135),
(201911, 20191101, 2019110105, 95),
(201911, 20191130, 2019113020, 100),
(201911, 20191130, 2019113021, 110),
(201911, 20191130, 2019113022, 125),
(201911, 20191130, 2019113023, 135)
SELECT X.YYYYMM, X.YYYYMMDD, X.YYYYMMDDHH
-- Case: If 'target_hours' doesn't exist, then 0, else 1
, CASE WHEN X.target_hours IS NULL THEN '0' ELSE '1' END AS [HOUR_EXISTS]
FROM (
-- Select right 2 characters from converted [YYYYMMDDHH] to act as 'target values'
SELECT T.*
, RIGHT(CAST(T.[YYYYMMDDHH] AS VARCHAR(10)), 2) AS [target_hours]
FROM #_temp AS T
) AS X
-- Right join to keep all of our hours and only the target hours that match.
RIGHT JOIN #_hours AS H ON H.hour_value = X.target_hours
Sample output:
YYYYMM YYYYMMDD YYYYMMDDHH HOUR_EXISTS
201911 20191101 2019110100 1
201911 20191101 2019110101 1
NULL NULL NULL 0
201911 20191101 2019110103 1
NULL NULL NULL 0
201911 20191101 2019110105 1
NULL NULL NULL 0
With (almost) standard sql, you can do a cross join of the distinct values of YYYYMMDD to a list of all possible hours and then left join to the table:
select concat(d.YYYYMMDD, h.hour) as YYYYMMDDHH,
case when t.YYYYMMDDHH is null then 0 else 1 end as hour_exists
from (select distinct YYYYMMDD from tablename) as d
cross join (
select '00' as hour union all select '01' union all
select '02' union all select '03' union all
select '04' union all select '05' union all
select '06' union all select '07' union all
select '08' union all select '09' union all
select '10' union all select '11' union all
select '12' union all select '13' union all
select '14' union all select '15' union all
select '16' union all select '17' union all
select '18' union all select '19' union all
select '20' union all select '21' union all
select '22' union all select '23'
) as h
left join tablename as t
on concat(d.YYYYMMDD, h.hour) = t.YYYYMMDDHH
order by concat(d.YYYYMMDD, h.hour)
Maybe in Snowflake you can construct the list of hours with a sequence much easier instead of all those UNION ALLs.
This version accounts for the full range of days, across months and years. It's a simple cross join of the set of possible days with the set of possible hours of the day -- left joined to actual dates.
set first = (select min(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
set last = (select max(yyyymmdd::number) from YYYYMMDDHH_DATA_AGG);
with
hours as (select row_number() over (order by null) - 1 h from table(generator(rowcount=>24))),
days as (
select
row_number() over (order by null) - 1 as n,
to_date($first::text, 'YYYYMMDD')::date + n as d,
to_char(d, 'YYYYMMDD') as yyyymmdd
from table(generator(rowcount=>($last-$first+1)))
)
select days.yyyymmdd || lpad(hours.h,2,0) as YYYYMMDDHH, nvl2(t.yyyymmddhh,1,0) as HOUR_EXISTS
from days cross join hours
left join YYYYMMDDHH_DATA_AGG t on t.yyyymmddhh = days.yyyymmdd || lpad(hours.h,2,0)
order by 1
;
$first and $last can be packed in as sub-queries if you prefer.
I have a table tracking location stays. There's an ID, startdatetime, enddatetime, and other fields.
I have another table with events that occur within each of those stays, with similar start and end times, and linked on the ID field.
What I need to do is merge the two and split the location table up into its individual events. The trick here is a location may start on 2017-08-02 but the first event might not start for a few days. Thus i'd need a record for that gap at the start.
sample data
CREATE TABLE #Stays (
EpID INT, StayId INT, StayStartDate DateTime, StayEndDate DateTime);
CREATE TABLE #Events (
EpID INT, EventId INT, EventStartDate DateTime, EventEndDate DateTime, EventNumber INT);
INSERT INTO #Events SELECT 1, 7897, '2016-11-24 00:00:00.000','2016-11-26 00:00:00.000', 1
INSERT INTO #Events SELECT 1, 7898, '2016-11-26 00:00:00.000','2016-11-28 00:00:00.000', 2
INSERT INTO #Stays SELECT 1, 10, '2016-11-22 08:15:00.000','2016-11-24 10:54:00.000'
INSERT INTO #Stays SELECT 1, 11, '2016-11-24 10:54:00.000','2016-11-24 11:17:00.000'
INSERT INTO #Stays SELECT 1, 12, '2016-11-24 11:17:00.000','2016-11-25 08:16:00.000'
INSERT INTO #Stays SELECT 1, 13, '2016-11-25 08:16:00.000','2016-11-28 23:15:00.000'
expected output would be
EpId StartDate EndDate EventNumber
1 2016-11-22 08:15:00.000 2016-11-23 23:59:59.000 NULL
1 2016-11-24 00:00:00.000 2016-11-25 23:59:59.000 7897
1 2016-11-26 00:00:00.000 2016-11-27 23:59:59.000 7898
1 2016-11-28 00:00:00.000 2016-11-28 23:15:00.000 NULL
here is what i'm trying. It currently doesn't work properly, and i'm sure the method i'm working on is probably not the best. It's currently not melding the two datasets together.
My guess is theres a much easier way to do it with outer or cross apply, but my knowledge of how they work is rather limited.
Any help?
;with e as (
SELECT [EpID]
,EventId
,[EventNumber]
,case when [EventStartDate] > DayStart then [EventStartDate] else DayStart end as [EventStart]
,case when [EventEndDate] < DayEnd then [EventEndDate] else DayEnd end as [EventEnd]
FROM [Events] e
inner join DimStaySegmentDayReference d on d.DayEnd >= e.[EventStartDate] and d.DayStart <= e.[EventEndDate]
),
s as (
select
[EpID]
,StayId
,case when StayStartDate > DayStart then StayStartDate else DayStart end as [StayStart]
,case when StayEndDate < DayEnd then StayEndDate else DayEnd end as [StayEnd]
from Stays s
inner join DimStaySegmentDayReference d on d.DayEnd >= StayStartDate and d.DayStart <= StayEndDate
),
u as (select 'stay' as source, [EpID], StayStart, StayEnd, '' as event from s
union all
select 'event' as source, [EpID], [EventStart], [EventEnd], eventnumber as event from e)
select Source,
[EpID],
Staystart,
stayend,
case when lag(stayend) over (partition by EpId ORDER BY STAYSTART) < StayEnd-0.0001 AND source='event' then lag(stayend) over (partition by EpId ORDER BY STAYSTART) else staystart end as staystartnew,
case when lead(staystart) over (partition by EpID ORDER BY StayStart) < stayend then lead(staystart) over (partition by EpID ORDER BY StayStart) else stayend end as stayendnew,
event
from u
where StayStart <> stayend
order by StayStart
The DayReference table is simply every day with a start and end time so i can split the record into day segments.
I'm using SQL Server 2012
Edit for some context
I've updated my sample data to make it a bit clearer.
The stay table tracks location stays. In this provided case i'm ignoring multiple locations to make finding a solution easier.
Locations and Events are agnostic to each other, other than occurring for the same EpID within the same time frame.
As an example consider tracking time at work, you start at 9am and finish at 5pm. For this work day you'll have say 5 location stays making up the full shift. 9-11 desk, 11-12 meeting, 12-1 lunch, 1-3 meeting, 3-5 desk.
You then have a series of events, lets call it drinking coffee. You drink coffee between 9:30 and 10, and 2-4.
What I need to do is mesh together these two sets of data creating a single timeline.
9-930 desk, 930-10 coffee, 10-11 desk, 11-12 meeting, 12-1 lunch, 1-2 meeting, 2-4 coffee, 4-5 desk.
Hope this helps
Probably some things can be simplified, but will be easy to read what I am validating for each case, also, I think that one row is missing in your output example, I got a last one from 2018-09-14 16:00 To 2018-09-15 12:00 and I did not find a reason on the logic or the question to discard it
Extra validations and a left join to the Stays with no registered events would be needed, but here is my approach
;WITH CTE AS (
SELECT D.*, s.StayId,
EventNumber,
LAG(D.DStart) OVER (ORDER BY EventNumber) As LagStart,
LAG(StayID) OVER (ORDER BY EventNumber) As LagStay,
LAG(Event) OVER (ORDER BY EventNumber) As LagEvent,
LEAD(D.DEnd) OVER (ORDER BY EventNumber) As LeadEnd,
LEAD(StayID) OVER (ORDER BY EventNumber) As LeadStay,
LEAD(Event) OVER (ORDER BY EventNumber) As LeadEvent
FROM #Events E
CROSS APPLY
(
SELECT TOP 1 * FROM #Stays S WHERE E.EventStartDate BETWEEN S.StayStartDate AND S.StayEndDate
UNION
SELECT TOP 1 * FROM #Stays S WHERE E.EventEndDate BETWEEN S.StayStartDate AND S.StayEndDate
) S
CROSS APPLY (
SELECT StayStartDate AS DStart, EventStartDate DEnd, Null AS Event, 1 as c WHERE StayStartDate < EventStartDate
UNION
SELECT EventStartDate, EventEndDate, EventNumber, 2 WHERE EventStartDate >= StayStartDate AND EventEndDate <= StayEndDate
UNION
SELECT StayStartDate, EventEndDate, EventNumber, 3 WHERE StayStartDate > EventStartDate AND EventEndDate < StayEndDate
UNION
SELECT EventStartDate, StayEndDate, EventNumber, 4 WHERE StayStartDate < EventStartDate AND EventEndDate > StayEndDate
UNION
SELECT EventEndDate, StayEndDate, Null, 5 WHERE EventEndDate < StayEndDate
) D
)
SELECT DISTINCT
CASE WHEN LagStay = StayId AND Event IS NULL AND LagEvent IS NULL THEN LagStart
ELSE DStart END AS StartDate,
CASE WHEN LeadStay = StayId AND Event IS NULL AND LeadEvent IS NULL THEN LeadEnd
ELSE DEnd END AS EndDate,
Event, StayID
FROM CTE
ORDER BY StartDate
I want to get all times that an event is not taking place for each room. The start of the day is 9:00:00 and end is 22:00:00.
What my database looks like is this:
Event EventStart EventEnd Days Rooms DayStarts
CISC 3660 09:00:00 12:30:00 Monday 7-3 9/19/2014
MATH 2501 15:00:00 17:00:00 Monday:Wednesday 7-2 10/13/2014
CISC 1110 14:00:00 16:00:00 Monday 7-3 9/19/2014
I want to get the times that aren't in the database.
ex. For SelectedDate (9/19/2014) the table should return:
Room FreeTimeStart FreeTimeEnd
7-3 12:30:00 14:00:00
7-3 16:00:00 22:00:00
ex2. SelectedDate (10/13/2014):
Room FreeTimeStart FreeTimeEnd
7-2 9:00:00 15:00:00
7-2 17:00:00 22:00:00
What I have tried is something like this:
select * from Events where ________ NOT BETWEEN eventstart AND eventend;
But I do not know what to put in the place of the space.
This was a pretty complex request. SQL works best with sets, and not looking at line by line. Here is what I came up with. To make it easier to figure out, I wrote it as a series of CTE's so I could work through the problem a step at a time. I am not saying that this is the best possible way to do it, but it doesn't require the use of any cursors. You need the Events table and a table of the room names (otherwise, you don't see a room that doesn't have any bookings).
Here is the query and I will explain the methodology.
DECLARE #Events TABLE (Event varchar(20), EventStart Time, EventEnd Time, Days varchar(50), Rooms varchar(10), DayStarts date)
INSERT INTO #Events
SELECT 'CISC 3660', '09:00:00', '12:30:00', 'Monday', '7-3', '9/19/2014' UNION
SELECT 'MATH 2501', '15:00:00', '17:00:00', 'Monday:Wednesday', '7-2', '10/13/2014' UNION
SELECT 'CISC 1110', '14:00:00', '16:00:00', 'Monday', '7-3', '9/19/2014'
DECLARE #Rooms TABLE (RoomName varchar(10))
INSERT INTO #Rooms
SELECT '7-2' UNION
SELECT '7-3'
DECLARE #SelectedDate date = '9/19/2014'
DECLARE #MinTimeInterval int = 30 --smallest time unit room can be reserved for
;WITH
D1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
D2(N) AS (SELECT 1 FROM D1 a, D1 b),
D4(N) AS (SELECT 1 FROM D2 a, D2 b),
Numbers AS (SELECT TOP 3600 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS Number FROM D4),
AllTimes AS
(SELECT CAST(DATEADD(n,Numbers.Number*#MinTimeInterval,'09:00:00') as time) AS m FROM Numbers
WHERE DATEADD(n,Numbers.Number*#MinTimeInterval,'09:00:00') <= '22:00:00'),
OccupiedTimes AS (
SELECT e.Rooms, ValidTimes.m
FROM #Events E
CROSS APPLY (SELECT m FROM AllTimes WHERE m BETWEEN CASE WHEN e.EventStart = '09:00:00' THEN e.EventStart ELSE DATEADD(n,1,e.EventStart) END and CASE WHEN e.EventEnd = '22:00:00' THEN e.EventEnd ELSE DATEADD(n,-1,e.EventEnd) END) ValidTimes
WHERE e.DayStarts = #SelectedDate
),
AllRoomsAllTimes AS (
SELECT * FROM #Rooms R CROSS JOIN AllTimes
), AllOpenTimes AS (
SELECT a.*, ROW_NUMBER() OVER( PARTITION BY (a.RoomName) ORDER BY a.m) AS pos
FROM AllRoomsAllTimes A
LEFT OUTER JOIN OccupiedTimes o ON a.RoomName = o.Rooms AND a.m = o.m
WHERE o.m IS NULL
), Finalize AS (
SELECT a1.RoomName,
CASE WHEN a3.m IS NULL OR DATEDIFF(n,a3.m, a1.m) > #MinTimeInterval THEN a1.m else NULL END AS FreeTimeStart,
CASE WHEN a2.m IS NULL OR DATEDIFF(n,a1.m,a2.m) > #MinTimeInterval THEN A1.m ELSE NULL END AS FreeTimeEnd,
ROW_NUMBER() OVER( ORDER BY a1.RoomName ) AS Pos
FROM AllOpenTimes A1
LEFT OUTER JOIN AllOpenTimes A2 ON a1.RoomName = a2.RoomName and a1.pos = a2.pos-1
LEFT OUTER JOIN AllOpenTimes A3 ON a1.RoomName = a3.RoomName and a1.pos = a3.pos+1
WHERE A2.m IS NULL OR DATEDIFF(n,a1.m,a2.m) > #MinTimeInterval
OR
A3.m IS NULL OR DATEDIFF(n,a3.m, a1.m) > #MinTimeInterval
)
SELECT F1.RoomName, f1.FreeTimeStart, f2.FreeTimeEnd FROM Finalize F1
LEFT OUTER JOIN Finalize F2 ON F1.Pos = F2.pos-1 AND f1.RoomName = f2.RoomName
WHERE f1.pos % 2 = 1
In the first several lines, I create temp variables to simulate your tables Events and Rooms.
The variable #MinTimeInterval determines what time interval the room schedules can be on (every 30 min, 15 min, etc - this number needs to divide evenly into 60).
Since SQL cannot query data that is missing, we need to create a table that holds all of the times that we want to check for. The first several lines in the WITH create a table called AllTimes which are all the possible time intervals in your day.
Next, we get a list of all of the times that are occupied (OccupiedTimes), and then LEFT OUTER JOIN this table to the AllTimes table which gives us all the available times. Since we only want the start and end of each free time, create the Finalize table which self joins each record to the previous and next record in the table. If the times in these rows are greater than #MinTimeInterval, then we know it is either a start or end of a free time.
Finally we self join this last table to put the start and end times in the same row and only look at every other row.
This will need to be adjusted if a single row in Events spans multiple days or multiple rooms.
Here's a solution that will return the "complete picture" including rooms that aren't booked at all for the day in question:
Declare #Date char(8) = '20141013'
;
WITH cte as
(
SELECT *
FROM -- use your table name instead of the VALUES construct
(VALUES
('09:00:00','12:30:00' ,'7-3', '20140919'),
('15:00:00','17:00:00' ,'7-2', '20141013'),
('14:00:00','16:00:00' ,'7-3', '20140919')) x(EventStart , EventEnd,Rooms, DayStarts)
), cte_Days_Rooms AS
-- get a cartesian product for the day specified and all rooms as well as the start and end time to compare against
(
SELECT y.EventStart,y.EventEnd, x.rooms,a.DayStarts FROM
(SELECT #Date DayStarts) a
CROSS JOIN
(SELECT DISTINCT Rooms FROM cte)x
CROSS JOIN
(SELECT '09:00:00' EventStart,'09:00:00' EventEnd UNION ALL
SELECT '22:00:00' EventStart,'22:00:00' EventEnd) y
), cte_1 AS
-- Merge the original data an the "base data"
(
SELECT * FROM cte WHERE DayStarts=#Date
UNION ALL
SELECT * FROM cte_Days_Rooms
), cte_2 as
-- use the ROW_NUMBER() approach to sort the data
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY DayStarts, Rooms ORDER BY EventStart) as pos
FROM cte_1
)
-- final query: self join with an offest of one row, eliminating duplicate rows if a room is booked starting 9:00 or ending 22:00
SELECT c2a.DayStarts, c2a.Rooms , c2a.EventEnd, c2b.EventStart
FROM cte_2 c2a
INNER JOIN cte_2 c2b on c2a.DayStarts = c2b.DayStarts AND c2a.Rooms =c2b.Rooms AND c2a.pos = c2b.pos -1
WHERE c2a.EventEnd <> c2b.EventStart
ORDER BY c2a.DayStarts, c2a.Rooms
How do you create a moving average in SQL?
Current table:
Date Clicks
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520
2012-05-04 1,330
2012-05-05 2,260
2012-05-06 3,540
2012-05-07 2,330
Desired table or output:
Date Clicks 3 day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010
This is an Evergreen Joe Celko question.
I ignore which DBMS platform is used. But in any case Joe was able to answer more than 10 years ago with standard SQL.
Joe Celko SQL Puzzles and Answers citation:
"That last update attempt suggests that we could use the predicate to
construct a query that would give us a moving average:"
SELECT S1.sample_time, AVG(S2.load) AS avg_prev_hour_load
FROM Samples AS S1, Samples AS S2
WHERE S2.sample_time
BETWEEN (S1.sample_time - INTERVAL 1 HOUR)
AND S1.sample_time
GROUP BY S1.sample_time;
Is the extra column or the query approach better? The query is
technically better because the UPDATE approach will denormalize the
database. However, if the historical data being recorded is not going
to change and computing the moving average is expensive, you might
consider using the column approach.
MS SQL Example:
CREATE TABLE #TestDW
( Date1 datetime,
LoadValue Numeric(13,6)
);
INSERT INTO #TestDW VALUES('2012-06-09' , '3.540' );
INSERT INTO #TestDW VALUES('2012-06-08' , '2.260' );
INSERT INTO #TestDW VALUES('2012-06-07' , '1.330' );
INSERT INTO #TestDW VALUES('2012-06-06' , '5.520' );
INSERT INTO #TestDW VALUES('2012-06-05' , '3.150' );
INSERT INTO #TestDW VALUES('2012-06-04' , '2.230' );
SQL Puzzle query:
SELECT S1.date1, AVG(S2.LoadValue) AS avg_prev_3_days
FROM #TestDW AS S1, #TestDW AS S2
WHERE S2.date1
BETWEEN DATEADD(d, -2, S1.date1 )
AND S1.date1
GROUP BY S1.date1
order by 1;
One way to do this is to join on the same table a few times.
select
(Current.Clicks
+ isnull(P1.Clicks, 0)
+ isnull(P2.Clicks, 0)
+ isnull(P3.Clicks, 0)) / 4 as MovingAvg3
from
MyTable as Current
left join MyTable as P1 on P1.Date = DateAdd(day, -1, Current.Date)
left join MyTable as P2 on P2.Date = DateAdd(day, -2, Current.Date)
left join MyTable as P3 on P3.Date = DateAdd(day, -3, Current.Date)
Adjust the DateAdd component of the ON-Clauses to match whether you want your moving average to be strictly from the past-through-now or days-ago through days-ahead.
This works nicely for situations where you need a moving average over only a few data points.
This is not an optimal solution for moving averages with more than a few data points.
select t2.date, round(sum(ct.clicks)/3) as avg_clicks
from
(select date from clickstable) as t2,
(select date, clicks from clickstable) as ct
where datediff(t2.date, ct.date) between 0 and 2
group by t2.date
Example here.
Obviously you can change the interval to whatever you need. You could also use count() instead of a magic number to make it easier to change, but that will also slow it down.
General template for rolling averages that scales well for large data sets
WITH moving_avg AS (
SELECT 0 AS [lag] UNION ALL
SELECT 1 AS [lag] UNION ALL
SELECT 2 AS [lag] UNION ALL
SELECT 3 AS [lag] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1]) AS [avg_value1],
AVG([value2]) AS [avg_value2]
FROM [data_table]
CROSS JOIN moving_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
And for weighted rolling averages:
WITH weighted_avg AS (
SELECT 0 AS [lag], 1.0 AS [weight] UNION ALL
SELECT 1 AS [lag], 0.6 AS [weight] UNION ALL
SELECT 2 AS [lag], 0.3 AS [weight] UNION ALL
SELECT 3 AS [lag], 0.1 AS [weight] --ETC
)
SELECT
DATEADD(day,[lag],[date]) AS [reference_date],
[otherkey1],[otherkey2],[otherkey3],
AVG([value1] * [weight]) / AVG([weight]) AS [wavg_value1],
AVG([value2] * [weight]) / AVG([weight]) AS [wavg_value2]
FROM [data_table]
CROSS JOIN weighted_avg
GROUP BY [otherkey1],[otherkey2],[otherkey3],DATEADD(day,[lag],[date])
ORDER BY [otherkey1],[otherkey2],[otherkey3],[reference_date];
select *
, (select avg(c2.clicks) from #clicks_table c2
where c2.date between dateadd(dd, -2, c1.date) and c1.date) mov_avg
from #clicks_table c1
Use a different join predicate:
SELECT current.date
,avg(periods.clicks)
FROM current left outer join current as periods
ON current.date BETWEEN dateadd(d,-2, periods.date) AND periods.date
GROUP BY current.date HAVING COUNT(*) >= 3
The having statement will prevent any dates without at least N values from being returned.
assume x is the value to be averaged and xDate is the date value:
SELECT avg(x) from myTable WHERE xDate BETWEEN dateadd(d, -2, xDate) and xDate
In hive, maybe you could try
select date, clicks, avg(clicks) over (order by date rows between 2 preceding and current row) as moving_avg from clicktable;
For the purpose, I'd like to create an auxiliary/dimensional date table like
create table date_dim(date date, date_1 date, dates_2 date, dates_3 dates ...)
while date is the key, date_1 for this day, date_2 contains this day and the day before; date_3...
Then you can do the equal join in hive.
Using a view like:
select date, date from date_dim
union all
select date, date_add(date, -1) from date_dim
union all
select date, date_add(date, -2) from date_dim
union all
select date, date_add(date, -3) from date_dim
NOTE: THIS IS NOT AN ANSWER but an enhanced code sample of Diego Scaravaggi's answer. I am posting it as answer as the comment section is insufficient. Note that I have parameter-ized the period for Moving aveage.
declare #p int = 3
declare #t table(d int, bal float)
insert into #t values
(1,94),
(2,99),
(3,76),
(4,74),
(5,48),
(6,55),
(7,90),
(8,77),
(9,16),
(10,19),
(11,66),
(12,47)
select a.d, avg(b.bal)
from
#t a
left join #t b on b.d between a.d-(#p-1) and a.d
group by a.d
--#p1 is period of moving average, #01 is offset
declare #p1 as int
declare #o1 as int
set #p1 = 5;
set #o1 = 3;
with np as(
select *, rank() over(partition by cmdty, tenor order by markdt) as r
from p_prices p1
where
1=1
)
, x1 as (
select s1.*, avg(s2.val) as avgval from np s1
inner join np s2
on s1.cmdty = s2.cmdty and s1.tenor = s2.tenor
and s2.r between s1.r - (#p1 - 1) - (#o1) and s1.r - (#o1)
group by s1.cmdty, s1.tenor, s1.markdt, s1.val, s1.r
)
I'm not sure that your expected result (output) shows classic "simple moving (rolling) average" for 3 days. Because, for example, the first triple of numbers by definition gives:
ThreeDaysMovingAverage = (2.230 + 3.150 + 5.520) / 3 = 3.6333333
but you expect 4.360 and it's confusing.
Nevertheless, I suggest the following solution, which uses window-function AVG. This approach is much more efficient (clear and less resource-intensive) than SELF-JOIN introduced in other answers (and I'm surprised that no one has given a better solution).
-- Oracle-SQL dialect
with
data_table as (
select date '2012-05-01' AS dt, 2.230 AS clicks from dual union all
select date '2012-05-02' AS dt, 3.150 AS clicks from dual union all
select date '2012-05-03' AS dt, 5.520 AS clicks from dual union all
select date '2012-05-04' AS dt, 1.330 AS clicks from dual union all
select date '2012-05-05' AS dt, 2.260 AS clicks from dual union all
select date '2012-05-06' AS dt, 3.540 AS clicks from dual union all
select date '2012-05-07' AS dt, 2.330 AS clicks from dual
),
param as (select 3 days from dual)
select
dt AS "Date",
clicks AS "Clicks",
case when rownum >= p.days then
avg(clicks) over (order by dt
rows between p.days - 1 preceding and current row)
end
AS "3 day Moving Average"
from data_table t, param p;
You see that AVG is wrapped with case when rownum >= p.days then to force NULLs in first rows, where "3 day Moving Average" is meaningless.
We can apply Joe Celko's "dirty" left outer join method (as cited above by Diego Scaravaggi) to answer the question as it was asked.
declare #ClicksTable table ([Date] date, Clicks int)
insert into #ClicksTable
select '2012-05-01', 2230 union all
select '2012-05-02', 3150 union all
select '2012-05-03', 5520 union all
select '2012-05-04', 1330 union all
select '2012-05-05', 2260 union all
select '2012-05-06', 3540 union all
select '2012-05-07', 2330
This query:
SELECT
T1.[Date],
T1.Clicks,
-- AVG ignores NULL values so we have to explicitly NULLify
-- the days when we don't have a full 3-day sample
CASE WHEN count(T2.[Date]) < 3 THEN NULL
ELSE AVG(T2.Clicks)
END AS [3-Day Moving Average]
FROM #ClicksTable T1
LEFT OUTER JOIN #ClicksTable T2
ON T2.[Date] BETWEEN DATEADD(d, -2, T1.[Date]) AND T1.[Date]
GROUP BY T1.[Date]
Generates the requested output:
Date Clicks 3-Day Moving Average
2012-05-01 2,230
2012-05-02 3,150
2012-05-03 5,520 4,360
2012-05-04 1,330 3,330
2012-05-05 2,260 3,120
2012-05-06 3,540 3,320
2012-05-07 2,330 3,010