I am trying to write a query in sql where I need to find the max no. of consecutive months over a period of last 12 months excluding June and July.
so for example I have an initial table as follows
+---------+--------------+-----------+------------+
| id | Payment | amount | Date |
+---------+--------------+-----------+------------+
| 1 | CJ1 | 70000 | 11/3/2020 |
| 1 | 1B4 | 36314000 | 12/1/2020 |
| 1 | I21 | 119439000 | 1/12/2021 |
| 1 | 0QO | 9362100 | 2/2/2021 |
| 1 | 1G0 | 140431000 | 2/23/2021 |
| 1 | 1G | 9362100 | 3/2/2021 |
| 1 | g5d | 9362100 | 4/6/2021 |
| 1 | rt5s | 13182500 | 4/13/2021 |
| 1 | fgs5 | 48598 | 5/18/2021 |
| 1 | sd8 | 42155 | 5/25/2021 |
| 1 | wqe8 | 47822355 | 7/20/2021 |
| 1 | cbg8 | 4589721 | 7/27/2021 |
| 1 | jlk8 | 4589721 | 8/3/2021 |
| 1 | cxn9 | 4589721 | 10/5/2021 |
| 1 | qwe | 45897210 | 11/9/2021 |
| 1 | mmm | 45897210 | 12/16/2021 |
+---------+--------------+-----------+------------+
I have written below query:
SELECT customer_number, year, month,
payment_month - lag(payment_month) OVER(partition by customer_number ORDER BY year, month) as previous_month_indicator,
FROM
(
SELECT DISTINCT Month(date) as month, Year(date) as year, CUSTOMER_NUMBER
FROM Table1
WHERE Month(date) not in (6,7)
and TO_DATE(date,'yyyy-MM-dd') >= DATE_SUB('2021-12-31', 425)
and customer_number = 1
) As C
and I get this output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | null |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | -11 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 3 |
| 1 | 2021 | 10 | 2 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
What I want is to get a view like this
Expected output
+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
| 1 | 2020 | 11 | 1 |
| 1 | 2020 | 12 | 1 |
| 1 | 2021 | 1 | 1 |
| 1 | 2021 | 2 | 1 |
| 1 | 2021 | 3 | 1 |
| 1 | 2021 | 4 | 1 |
| 1 | 2021 | 5 | 1 |
| 1 | 2021 | 8 | 1 |
| 1 | 2021 | 9 | 0 |
| 1 | 2021 | 10 | 1 |
| 1 | 2021 | 11 | 1 |
+-----------------+------+-------+--------------------------+
As June/July does not matter, after May, August should be considered as consecutive month, and since in September there was no record it appears as 0 and breaks the consecutive months chain.
My final desired output is to get the max no of consecutive months in which transactions were made which in above case is 8 from Nov-2020 to Aug-2021
Final Desired Output:
+-----------------+-------------------------+
| customer_number | Max_consecutive_months |
+-----------------+-------------------------+
| 1 | 8 |
+-----------------+-------------------------+
CTEs can break this down a little easier. In the code below, the payment_streak CTE is the key bit; the start_of_streak field is first marking rows that count as the start of a streak, and then taking the maximum over all previous rows (to find the start of this streak).
The last SELECT is only comparing these two dates, computing how many months are between them (excluding June/July), and then finding the best streak per customer.
WITH payments_in_context AS (
SELECT customer_number,
date,
lag(date) OVER (PARTITION BY customer_number ORDER BY date) AS prev_date
FROM Table1
WHERE EXTRACT(month FROM date) NOT IN (6,7)
),
payment_streak AS (
SELECT
customer_number,
date,
max(
CASE WHEN (prev_date IS NULL)
OR (EXTRACT(month FROM date) <> 8
AND (date - prev_date >= 62
OR MOD(12 + EXTRACT(month FROM date) - EXTRACT(month FROM prev_date),12)) > 1))
OR (EXTRACT(month FROM date) = 8
AND (date - prev_date >= 123
OR EXTRACT(month FROM prev_date) NOT IN (5,8)))
THEN date END
) OVER (PARTITION BY customer_number ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
as start_of_streak
FROM payments_in_context
)
SELECT customer_number,
max( 1 +
10*(EXTRACT(year FROM date) - EXTRACT(year FROM start_of_streak))
+ (EXTRACT(month FROM date) - EXTRACT(month FROM start_of_streak))
+ CASE WHEN (EXTRACT(month FROM date) > 7 AND EXTRACT(month FROM start_of_streak) < 6)
THEN -2
WHEN (EXTRACT(month FROM date) < 6 AND EXTRACT(month FROM start_of_streak) > 7)
THEN 2
ELSE 0 END
) AS max_consecutive_months
FROM payment_streak
GROUP BY 1;
You can use a recursive cte to generate all the dates in the twelve month timespan for each customer id, and then find the maximum number of consecutive dates excluding June and July in that interval:
with recursive cte(id, m, c) as (
select cust_id, min(date), 1 from payments group by cust_id
union all
select c.id, c.m + interval 1 month, c.c+1 from cte c where c.c <= 12
),
dts(id, m, f) as (
select c.id, c.m, c.c = 1 or exists
(select 1 from payments p where p.cust_id = c.id and extract(month from p.date) = extract(month from (c.m - interval 1 month)) and extract(year from p.date) = extract(year from (c.m - interval 1 month)))
from cte c where extract(month from c.m) not in (6,7)
),
result(id, f, c) as (
select d.id, d.f, (select sum(d.id = d1.id and d1.m < d.m and d1.f = 0)+1 from dts d1)
from dts d where d.f != 0
)
select r1.id, max(r1.s)-1 from (select r.id, r.c, sum(r.f) s from result r group by r.id, r.c) r1 group by r1.id
I am currently working with a report through Microsoft Query and I ran into this problem where I need to calculate the total amount of money for the past year.
The table looks like this:
Item Number | Month | Year | Amount |
...........PAST YEARS DATA...........
12345 | 1 | 2019 | 10 |
12345 | 2 | 2019 | 20 |
12345 | 3 | 2019 | 15 |
12345 | 4 | 2019 | 12 |
12345 | 5 | 2019 | 11 |
12345 | 6 | 2019 | 12 |
12345 | 7 | 2019 | 12 |
12345 | 8 | 2019 | 10 |
12345 | 9 | 2019 | 10 |
12345 | 10 | 2019 | 10 |
12345 | 11 | 2019 | 10 |
12345 | 12 | 2019 | 10 |
12345 | 1 | 2020 | 10 |
12345 | 2 | 2020 | 10 |
How would you calculate the total amount from 02-2019 to 02-2020 for the item number 12345?
Assuming that you are running SQL Server, you can recreate a date with datefromparts() and use it for filtering:
select sum(amount)
from mytable
where
itemnumber = 12345
and datefromparts(year, month, 1) >= '20190201'
and datefromparts(year, month, 1) < '20200301'
You can use this also
SELECT sum(amount) as Amount
FROM YEARDATA
WHERE ( Month >=2 and year = '2019')
or ( Month <=2 and year = '2020')
and ItemNumber = '12345'
Let us assume a calendar week.
The week number is 02 of 2020.
I am looking for ways to find the beginning and end dates of the week.
Any pointers to built in function or any other approaches will be helpful.
I don't see a direct way, but with existing date functions, it is super easy to build a look up table which you can query:
CREATE TABLE day_of_week_table AS
SELECT
date,
EXTRACT(ISOYEAR FROM date) AS isoyear,
EXTRACT(ISOWEEK FROM date) AS isoweek,
EXTRACT(WEEK FROM date) AS week,
EXTRACT(DAYOFWEEK FROM date) AS dayOfWeek
FROM UNNEST(GENERATE_DATE_ARRAY('2020-1-1', '2021-1-1')) AS date
ORDER BY date;
Paste first a few rows of this table
| date | isoyear | isoweek | week | dayOfWeek |
+------------+---------+---------+------+-----------+
| 2020-01-01 | 2020 | 1 | 0 | 4 |
| 2020-01-02 | 2020 | 1 | 0 | 5 |
| 2020-01-03 | 2020 | 1 | 0 | 6 |
| 2020-01-04 | 2020 | 1 | 0 | 7 |
| 2020-01-05 | 2020 | 1 | 1 | 1 |
| 2020-01-06 | 2020 | 2 | 1 | 2 |
| 2020-01-07 | 2020 | 2 | 1 | 3 |
| 2020-01-08 | 2020 | 2 | 1 | 4 |
| 2020-01-09 | 2020 | 2 | 1 | 5 |
| 2020-01-10 | 2020 | 2 | 1 | 6 |
| 2020-01-11 | 2020 | 2 | 1 | 7 |
I have a Select Statement that requires that the DateFirst = 1 Monday
In the US so default is 7 Sunday
How can I modify this to embed the DateFirst in the select statement so I can create it as a view?
SET DATEFIRST 1;
SELECT
T_APPLICANT.APPL_ID AS empID,
T_APPLICANT.APPL_LASTNAME,
T_APPLICANT.APPL_FIRSTNAME,
T_APPLICANT_ASSIGNMENT.ASS_STARTDATE,
DATEPART(ww, dbo.T_APPLICANT_ASSIGNMENT.ASS_STARTDATE) AS WeekNo,
DATEPART(WEEKDAY, dbo.T_APPLICANT_ASSIGNMENT.ASS_STARTDATE) AS WeekDay,
DATEPART(ww, GETDATE()) AS CurWeekNo,
(T_APPLICANT_ASSIGNMENT.ASS_HOURS) AS Total_Assigned_hrs,
(T_APPLICANT_ASSIGNMENT.ASS_BILL) AS AvgBill_Rate,
(T_APPLICANT_ASSIGNMENT.ASS_PAY) AS AvgPay_Rate,
(T_APPLICANT_ASSIGNMENT.ASS_HOURS * T_APPLICANT_ASSIGNMENT.ASS_PAY) AS Total_AmtPaid,
(T_APPLICANT_ASSIGNMENT.ASS_HOURS * T_APPLICANT_ASSIGNMENT.ASS_BILL) AS Total_AmtBilled,
(LTRIM(STR(DATEPART(yy, T_APPLICANT_ASSIGNMENT.ASS_STARTDATE))) + '-'
+ LTRIM(STR(DATEPART(M, T_APPLICANT_ASSIGNMENT.ASS_STARTDATE)))
) AS YearMo
FROM
T_APPLICANT
RIGHT OUTER JOIN
T_APPLICANT_ASSIGNMENT
ON T_APPLICANT.APPL_ID = T_APPLICANT_ASSIGNMENT.APPL_ID
WHERE
DATEPART(ww, dbo.T_APPLICANT_ASSIGNMENT.ASS_STARTDATE)
BETWEEN DATEPART(ww, GETDATE()) AND DATEPART(ww, GETDATE()) + 1
AND DATEPART(yy, T_APPLICANT_ASSIGNMENT.ASS_STARTDATE) = DATEPART(yy, GETDATE())
AND ASS_STATUS = 'A';
Unless proven otherwise, you can't set DATEFIRST in a view.
And neither in a user defined function.
So to have a view that returns week & weekday numbers as if DATEFIRST was set to 1?
That could use different calculations.
Haven't figured out yet how to calculate the week number regardless of the DATEFIRST setting as if it the weeks would start on Monday.
That's a tricky one.
I know, one could link to a Calendar table with the week numbers.
But that's not the goal here.
However, the WEEKDAY can also be calculated without using DATEPART.
For example by combining a CASE with a FORMAT.
Because the names of the weekdays remain the same, regardless of the DATEFIRST setting.
And an ISO_WEEK also starts on Monday.
So it can be used in the WHERE clause to filter on the current week & next week.
create table testdatefirst (
id int primary key not null identity(1,1),
dt date not null
)
GO
✓
with rcte as
(
select cast('2018-12-24' as date) dt
union all
select dateadd(day, 1, dt)
from rcte
where dt < cast('2019-03-01' as date)
)
insert into testdatefirst (dt)
select *
from rcte
order by dt
GO
68 rows affected
CREATE view vw_testdatefirst AS
select dt
, FORMAT(dt,'ddd','en-GB') as [dayname]
, DATEPART(WEEKDAY, dt) as [weekday]
, DATEPART(WEEK, dt) as [week]
-- , DATEPART(ISO_WEEK, dt) as [ISO_WEEK]
, case FORMAT(dt,'ddd','en-GB')
when 'Mon' then 1
when 'Tue' then 2
when 'Wed' then 3
when 'Thu' then 4
when 'Fri' then 5
when 'Sat' then 6
when 'Sun' then 7
end as [weekday2]
, (((DATEPART(WEEKDAY, dt) + ##DATEFIRST-2)%7)+1) AS [weekday3]
from testdatefirst
where DATEPART(ISO_WEEK, dt) between DATEPART(ISO_WEEK, '2019-01-01') and DATEPART(ISO_WEEK, '2019-01-01')+1
GO
✓
set datefirst 7;
GO
✓
select ##datefirst as [datefirst];
select * from vw_testdatefirst order by dt;
GO
| datefirst |
| :-------- |
| 7 |
dt | dayname | weekday | week | weekday2 | weekday3
:------------------ | :------ | ------: | ---: | -------: | -------:
31/12/2018 00:00:00 | Mon | 2 | 53 | 1 | 1
01/01/2019 00:00:00 | Tue | 3 | 1 | 2 | 2
02/01/2019 00:00:00 | Wed | 4 | 1 | 3 | 3
03/01/2019 00:00:00 | Thu | 5 | 1 | 4 | 4
04/01/2019 00:00:00 | Fri | 6 | 1 | 5 | 5
05/01/2019 00:00:00 | Sat | 7 | 1 | 6 | 6
06/01/2019 00:00:00 | Sun | 1 | 2 | 7 | 7
07/01/2019 00:00:00 | Mon | 2 | 2 | 1 | 1
08/01/2019 00:00:00 | Tue | 3 | 2 | 2 | 2
09/01/2019 00:00:00 | Wed | 4 | 2 | 3 | 3
10/01/2019 00:00:00 | Thu | 5 | 2 | 4 | 4
11/01/2019 00:00:00 | Fri | 6 | 2 | 5 | 5
12/01/2019 00:00:00 | Sat | 7 | 2 | 6 | 6
13/01/2019 00:00:00 | Sun | 1 | 3 | 7 | 7
set datefirst 1;
GO
✓
select ##datefirst as [datefirst];
select * from vw_testdatefirst order by dt;
GO
| datefirst |
| :-------- |
| 1 |
dt | dayname | weekday | week | weekday2 | weekday3
:------------------ | :------ | ------: | ---: | -------: | -------:
31/12/2018 00:00:00 | Mon | 1 | 53 | 1 | 1
01/01/2019 00:00:00 | Tue | 2 | 1 | 2 | 2
02/01/2019 00:00:00 | Wed | 3 | 1 | 3 | 3
03/01/2019 00:00:00 | Thu | 4 | 1 | 4 | 4
04/01/2019 00:00:00 | Fri | 5 | 1 | 5 | 5
05/01/2019 00:00:00 | Sat | 6 | 1 | 6 | 6
06/01/2019 00:00:00 | Sun | 7 | 1 | 7 | 7
07/01/2019 00:00:00 | Mon | 1 | 2 | 1 | 1
08/01/2019 00:00:00 | Tue | 2 | 2 | 2 | 2
09/01/2019 00:00:00 | Wed | 3 | 2 | 3 | 3
10/01/2019 00:00:00 | Thu | 4 | 2 | 4 | 4
11/01/2019 00:00:00 | Fri | 5 | 2 | 5 | 5
12/01/2019 00:00:00 | Sat | 6 | 2 | 6 | 6
13/01/2019 00:00:00 | Sun | 7 | 2 | 7 | 7
db<>fiddle here
I have a SQL related question I would love some help with as a suitable answer has been eluding me for some time.
Background
I’m working with a vendor product which has an Oracle Database which serves as the backend. I have the ability to write any adhoc SQL to query the underlying tables, but I cannot make any changes to their underlying structure (or to the data model itself). The table I’m interested currently has about +1M rows and essentially tracks users sessions. It has 4 columns of interest: session_id (which is a primary key and unique per session), user_name, start_date (date which tracks the beginning of the session), and stop_date (date which tracks the end of the session). My goal is to perform the aggregation of data for active sessions based on month, day, and hour give a set start date and end date. I need to create a view (or 3 separate views) which can either perform the aggregation itself or serve as the intermediate object from which I can then query and perform the aggregation. I understand the eventual SQL / view may actually need to be 3 different views (one for month, one for day, one for hour), but it seems to me that the concept (once achieved) should be the same regardless of the time period.
Current table example
Table Name = web_session
| Session_id | user_name | start_date | stop_date
----------------------------------------------------------------------------
| 1 | joe | 4/20/2017 10:42:10 PM | 4/21/2017 2:42:10 AM |
| 2 | matt | 4/20/2017 5:43:10 PM | 4/20/2017 5:59:10 PM |
| 3 | matt | 4/20/2017 3:42:10 PM | 4/20/2017 5:42:10 PM |
| 4 | joe | 4/20/2017 11:20:10 AM | 4/20/2017 4:42:10 PM |
| 5 | john | 4/20/2017 8:42:10 AM | 4/20/2017 11:42:10 AM |
| 6 | matt | 4/20/2017 7:42:10 AM | 4/20/2017 11:42:10 PM |
| 7 | joe | 4/19/2017 11:20:10 PM | 4/20/2017 1:42:10 AM |
Ideal Output For Hour View
-12:00 can be either 0 or 24 for the example
| Date | HR | active_sessions | distinct_users |
------------------------------------------------------------
| 4/21/2017 | 2 | 1 | 1 |
| 4/21/2017 | 1 | 1 | 1 |
| 4/20/2017 | 0 | 1 | 1 |
| 4/20/2017 | 23 | 1 | 1 |
| 4/20/2017 | 22 | 1 | 1 |
| 4/20/2017 | 17 | 2 | 1 |
| 4/20/2017 | 16 | 2 | 2 |
| 4/20/2017 | 15 | 2 | 2 |
| 4/20/2017 | 14 | 1 | 1 |
| 4/20/2017 | 13 | 1 | 1 |
| 4/20/2017 | 12 | 1 | 1 |
| 4/20/2017 | 11 | 3 | 3 |
| 4/20/2017 | 10 | 2 | 2 |
| 4/20/2017 | 9 | 2 | 2 |
| 4/20/2017 | 8 | 2 | 2 |
| 4/20/2017 | 7 | 1 | 1 |
| 4/20/2017 | 1 | 1 | 1 |
| 4/20/2017 | 0 | 1 | 1 |
| 4/19/2017 | 23 | 1 | 1 |
End Goal and Other Options
What I am eventually trying to achieve with this output is to populate a line chart which displays the number of active sessions for either a month, day, or hour (used in the example output) between two dates. In the hour example, the date in combination with the HR would be used along the X-axis and the active sessions would be used along the Y-axis. The distinct user count would be available if a user hovered over the point on the chart. FYI Active sessions are the total number of sessions that were open at any point during the interval. Distinct users are the total number of distinct users during the interval. If I logged on and off twice in the same hour, it would be 2 active sessions, but only 1 distinct user.
Alternative Solutions
This seems to be a problem which must have come up may times before, but from all of my googling and stack overflow research I cannot seem to find the correct approach. If I am thinking about the query or ideal output incorrectly I AM OPEN TO ALTERNATE SUGGESTIONS which allow me to get the desired output to populate the chart appropriately on the front end.
Some SQL I Have Tried (Good Faith Effort)
There are many queries I've tried, but I'll start with this one as it is the closest I got but is extremely slow (unusably so)and it still does not produce the result I need.
Select * FROM (
SELECT
u.YearDt, u.MonthDt, u.DayDt, u.HourDt, u.MinDt,
COUNT(Distinct u.session_id) as unique_sessions,
COUNT(Distinct u.user_name) as unique_users,
LISTAGG(u.user_name, ', ') WITHIN GROUP (ORDER BY u.user_name ASC) as users
FROM
(SELECT EXTRACT(year FROM l.start_date) as YearDt,
EXTRACT(month FROM l.start_date) as MonthDt,
EXTRACT(day FROM l.start_date) as DayDt,
EXTRACT(HOUR FROM CAST(l.start_date AS TIMESTAMP)) as HourDt,
EXTRACT(MINUTE FROM CAST(l.start_date AS TIMESTAMP)) as MinDt,
l.session_id,
l.user_name,
l.start_date as act_date,
1 as is_start
FROM web_session l
UNION ALL
SELECT EXTRACT(year FROM l.stop_date) as YearDt,
EXTRACT(month FROM l.stop_date) as MonthDt,
EXTRACT(day FROM l.stop_date) as DayDt,
EXTRACT(HOUR FROM CAST(l.stop_date AS TIMESTAMP)) as HourDt,
EXTRACT(MINUTE FROM CAST(l.stop_date AS TIMESTAMP)) as MinDt,
l.session_id,
l.user_name,
l.stop_date as act_date,
0 as is_start
FROM web_session l
) u
GROUP BY CUBE ( u.YearDt, u.MonthDt, u.DayDt, u.HourDt, u.MinDt)
) c
You can use a CTE (Query 1) or a correlated hierarchical query (Query 2) to generate the hours within the time ranges and then aggregate. This only requires a single table scan:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Web_Session ( Session_id, user_name, start_date, stop_date ) AS
SELECT 1, 'joe', CAST( TIMESTAMP '2017-04-20 22:42:10' AS DATE ), CAST( TIMESTAMP '2017-04-21 02:42:10' AS DATE ) FROM DUAL UNION ALL
SELECT 2, 'matt', TIMESTAMP '2017-04-20 17:43:10', TIMESTAMP '2017-04-20 17:59:10' FROM DUAL UNION ALL
SELECT 3, 'matt', TIMESTAMP '2017-04-20 15:42:10', TIMESTAMP '2017-04-20 17:42:10' FROM DUAL UNION ALL
SELECT 4, 'joe', TIMESTAMP '2017-04-20 11:20:10', TIMESTAMP '2017-04-20 16:42:10' FROM DUAL UNION ALL
SELECT 5, 'john', TIMESTAMP '2017-04-20 08:42:10', TIMESTAMP '2017-04-20 11:42:10' FROM DUAL UNION ALL
SELECT 6, 'matt', TIMESTAMP '2017-04-20 07:42:10', TIMESTAMP '2017-04-20 23:42:10' FROM DUAL UNION ALL
SELECT 7, 'joe', TIMESTAMP '2017-04-19 23:20:10', TIMESTAMP '2017-04-20 01:42:10' FROM DUAL;
Query 1:
WITH hours ( session_id, user_name, hour, duration ) AS (
SELECT session_id,
user_name,
CAST( TRUNC( start_date, 'HH24' ) AS DATE ),
( TRUNC( stop_date, 'HH24' ) - TRUNC( start_date, 'HH24' ) ) * 24
FROM web_session
UNION ALL
SELECT session_id,
user_name,
hour + INTERVAL '1' HOUR, -- There is a bug in SQLFiddle that subtracts
-- hours instead of adding so -1 is used there.
duration - 1
FROM hours
WHERE duration > 0
)
SELECT hour,
COUNT( session_id ) AS active_sessions,
COUNT( DISTINCT user_name ) AS distinct_users
FROM hours
GROUP BY hour
ORDER BY hour
Results:
| HOUR | ACTIVE_SESSIONS | DISTINCT_USERS |
|----------------------|-----------------|----------------|
| 2017-04-19T23:00:00Z | 1 | 1 |
| 2017-04-20T00:00:00Z | 1 | 1 |
| 2017-04-20T01:00:00Z | 1 | 1 |
| 2017-04-20T07:00:00Z | 1 | 1 |
| 2017-04-20T08:00:00Z | 2 | 2 |
| 2017-04-20T09:00:00Z | 2 | 2 |
| 2017-04-20T10:00:00Z | 2 | 2 |
| 2017-04-20T11:00:00Z | 3 | 3 |
| 2017-04-20T12:00:00Z | 2 | 2 |
| 2017-04-20T13:00:00Z | 2 | 2 |
| 2017-04-20T14:00:00Z | 2 | 2 |
| 2017-04-20T15:00:00Z | 3 | 2 |
| 2017-04-20T16:00:00Z | 3 | 2 |
| 2017-04-20T17:00:00Z | 3 | 1 |
| 2017-04-20T18:00:00Z | 1 | 1 |
| 2017-04-20T19:00:00Z | 1 | 1 |
| 2017-04-20T20:00:00Z | 1 | 1 |
| 2017-04-20T21:00:00Z | 1 | 1 |
| 2017-04-20T22:00:00Z | 2 | 2 |
| 2017-04-20T23:00:00Z | 2 | 2 |
| 2017-04-21T00:00:00Z | 1 | 1 |
| 2017-04-21T01:00:00Z | 1 | 1 |
| 2017-04-21T02:00:00Z | 1 | 1 |
Execution Plan:
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 14 | 364 | 7 | 00:00:01 |
| 1 | SORT GROUP BY | | 14 | 364 | 7 | 00:00:01 |
| 2 | VIEW | VW_DAG_0 | 14 | 364 | 7 | 00:00:01 |
| 3 | HASH GROUP BY | | 14 | 364 | 7 | 00:00:01 |
| 4 | VIEW | | 14 | 364 | 6 | 00:00:01 |
| 5 | UNION ALL (RECURSIVE WITH) BREADTH FIRST | | | | | |
| 6 | TABLE ACCESS FULL | WEB_SESSION | 7 | 245 | 3 | 00:00:01 |
| * 7 | RECURSIVE WITH PUMP | | | | | |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 7 - filter("DURATION">0)
Note
-----
- dynamic sampling used for this statement
Query 2:
SELECT t.COLUMN_VALUE AS hour,
COUNT( session_id ) AS active_sessions,
COUNT( DISTINCT user_name ) AS distinct_users
FROM web_session w
CROSS JOIN
TABLE(
CAST(
MULTISET(
SELECT TRUNC( w.start_date, 'HH24' ) + ( LEVEL - 1 ) / 24
FROM DUAL
CONNECT BY TRUNC( w.start_date, 'HH24' ) + ( LEVEL - 1 ) / 24 < w.stop_date
) AS SYS.ODCIDATELIST
)
) t
GROUP BY t.COLUMN_VALUE
ORDER BY hour
Results:
| HOUR | ACTIVE_SESSIONS | DISTINCT_USERS |
|----------------------|-----------------|----------------|
| 2017-04-19T23:00:00Z | 1 | 1 |
| 2017-04-20T00:00:00Z | 1 | 1 |
| 2017-04-20T01:00:00Z | 1 | 1 |
| 2017-04-20T07:00:00Z | 1 | 1 |
| 2017-04-20T08:00:00Z | 2 | 2 |
| 2017-04-20T09:00:00Z | 2 | 2 |
| 2017-04-20T10:00:00Z | 2 | 2 |
| 2017-04-20T11:00:00Z | 3 | 3 |
| 2017-04-20T12:00:00Z | 2 | 2 |
| 2017-04-20T13:00:00Z | 2 | 2 |
| 2017-04-20T14:00:00Z | 2 | 2 |
| 2017-04-20T15:00:00Z | 3 | 2 |
| 2017-04-20T16:00:00Z | 3 | 2 |
| 2017-04-20T17:00:00Z | 3 | 1 |
| 2017-04-20T18:00:00Z | 1 | 1 |
| 2017-04-20T19:00:00Z | 1 | 1 |
| 2017-04-20T20:00:00Z | 1 | 1 |
| 2017-04-20T21:00:00Z | 1 | 1 |
| 2017-04-20T22:00:00Z | 2 | 2 |
| 2017-04-20T23:00:00Z | 2 | 2 |
| 2017-04-21T00:00:00Z | 1 | 1 |
| 2017-04-21T01:00:00Z | 1 | 1 |
| 2017-04-21T02:00:00Z | 1 | 1 |
Execution Plan:
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 57176 | 2115512 | 200 | 00:00:03 |
| 1 | SORT GROUP BY | | 57176 | 2115512 | 200 | 00:00:03 |
| 2 | NESTED LOOPS | | 57176 | 2115512 | 195 | 00:00:03 |
| 3 | TABLE ACCESS FULL | WEB_SESSION | 7 | 245 | 3 | 00:00:01 |
| 4 | COLLECTION ITERATOR SUBQUERY FETCH | | 8168 | 16336 | 27 | 00:00:01 |
| * 5 | CONNECT BY WITHOUT FILTERING | | | | | |
| 6 | FAST DUAL | | 1 | | 2 | 00:00:01 |
--------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
------------------------------------------
* 5 - filter(TRUNC(:B1,'fmhh24')+(LEVEL-1)/24<:B2)
Note
-----
- dynamic sampling used for this statement
I think something like this will work:
WITH ct ( active_dt ) AS (
-- Build the query for the "table" of hours
SELECT DATE'2018-04-19' + (LEVEL-1)/24 AS active_dt FROM dual
CONNECT BY DATE'2018-04-19' + (LEVEL-1)/24 < DATE'2018-04-22'
)
SELECT active_dt AS "Date", active_hr AS "HR"
, COUNT(session_id) AS active_sessions
, COUNT(DISTINCT user_name) AS distinct_users
FROM (
SELECT TRUNC(ct.active_dt) AS active_dt
, TO_CHAR(ct.active_dt, 'HH24') AS active_hr
, ws.session_id, ws.user_name
FROM ct LEFT JOIN web_session ws
ON ct.active_dt + 1/24 >= ws.start_dt
AND ct.active_dt < ws.stop_dt
) GROUP BY active_dt, active_hr
ORDER BY active_dt DESC, active_hr DESC;
I may not have the conditions for the LEFT JOIN 100% correct.
Hope this helps.
Matt,
What you need to do is generate a time dimension either as a static table or dynamically at run time:
create table time_dim (
ts date primary key,
year number not null,
month number not null,
day number not null,
wday number not null,
dy varchar2(3) not null,
hr number not null
);
insert into time_dim (ts, year, month, day, wday, dy, hr)
select ts
, extract(year from ts) year
, extract(month from ts) month
, extract(day from ts) day
, to_char(ts,'d') wday
, to_char(ts,'dy') dy
, to_number(to_char(ts,'HH24')) hr
from (
select DATE '2017-01-01' + (level - 1)/24 ts
FROM DUAL connect by level <= 365*24) a;
Then outer join that to your web_sessions table:
select t.ts, t.year, t.month, t.wday, t.dy, t.hr
, count(session_id) sessions
, count(distinct user_name) users
from time_dim t
left join web_session w
on t.ts between trunc(w.start_date, 'hh24') and w.stop_date
where trunc(t.ts) between date '2017-04-19' and date '2017-04-21'
group by rollup (t.year, t.month, (t.wday, t.dy), (t.hr, t.ts));
You can change up the group by clause to get the various aggregates you're interested in.
In the above code, I'm truncating the start_date to the hour in the ON clause so that the start hour will be included in the results otherwise sessions that don't start exactly at the top of the hour would not get counted in that hour.