I have a leaves_table which contains id, holiday_start, holiday_end. I have another leaves_holiday table which contains the public holiday name and it's date. now i want to in the leaves_table to add a new column and exclude the days where it is a public holiday
lets say for example
leaves_table
id. holiday_start. holiday_end
1. 09-Jul-2022. 13-Jul-2022
public holiday table
holiday_name. holiday_date
christmas 10-Jul-2022
the query should return no of days excluded as 1
id. holiday_start. holiday_end. excluded days
1 09-Jul-2022. 13-Jul-2022. 1
how do i do this?
here is the create table and insert
create table XX_LEAVES_EXCLUDES
(
exclude_id number not null primary key,
holiday_start date not null,
holiday_end date not null
);
create sequence seq_exclude_id MINVALUE 1
START WITH 1
INCREMENT BY 1
CACHE 2;
create or replace trigger trg_exclude_id
before insert
on XX_LEAVES_EXCLUDES
for each row
begin
:new.exclude_id:=seq_exclude_id.nextval;
end;
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('23-Jul-2022','20-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('01-Jul-2022','02-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('13-Jul-2022','29-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('12-Jul-2022','01-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('01-Jul-2022','29-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('08-Jul-2022','08-Aug-2022');
INSERT INTO XX_LEAVES_EXCLUDES (HOLIDAY_START, HOLIDAY_END) VALUES ('03-Jul-2022','20-Aug-2022');
2nd table (public holiday calendar table)
CREATE TABLE "XX_LEAVES_PUBLIC_HOLIDAYS"
( "PUBLIC_HOLIDAY_UAE_YEAR_2022" VARCHAR2(50) NOT NULL,
"HOLIDAY_DATE" DATE NOT NULL ENABLE
)
INSERT INTO XX_LEAVES_PUBLIC_HOLIDAYS (PUBLIC_HOLIDAY_UAE_YEAR_2022, HOLIDAY_DATE) VALUES (National Day,'10-Jul-2022');
compare leave date rage with hodiday and get count as excluded_days
select l.id, l.holiday_start, l.holiday_end,
(select Count(1) from leaves_holiday lh
where l.holiday_start<= lh.holiday_date and l.holiday_end >= lh.holiday_date) as excluded_days
from leaves_table l
One option is to create calendar of all holiday dates (leaves_calendar CTE in my example) and then join it to public_holiday so that you'd know which dates to exclude.
Sample data:
SQL> with
2 leaves_table (id, holiday_start, holiday_end) as
3 (select 1, date '2022-07-09', date '2022-07-13' from dual union all
4 select 2, date '2022-05-25', date '2022-05-30' from dual
5 ),
6 public_holiday (holiday_name, holiday_date) as
7 (select 'Christmas' , date '2022-07-10' from dual union all
8 select 'My holiday', date '2022-07-12' from dual),
9 --
Query begins here; first create a calendar ...
10 leaves_calendar as
11 (select l.id, l.holiday_start + column_value - 1 as datum
12 from leaves_table l cross join
13 table(cast(multiset(select level from dual
14 connect by level <= l.holiday_end - l.holiday_start + 1
15 ) as sys.odcinumberlist))
16 )
... then return the result: start and end date, number of excluded dates and holiday names (you didn't ask for that, but ... not a problem)
17 select c.id,
18 min(c.datum) as holiday_start,
19 max(c.datum) as holiday_end,
20 sum(case when p.holiday_date = c.datum then 1 else 0 end) as excluded_days,
21 listagg(p.holiday_name, ', ') within group (order by p.holiday_date) as excluded
22 from leaves_calendar c left join public_holiday p on p.holiday_date = c.datum
23 group by c.id;
ID HOLIDAY_START HOLIDAY_END EXCLUDED_DAYS EXCLUDED
---------- --------------- --------------- ------------- ------------------------------
1 09.07.2022 13.07.2022 2 Christmas, My holiday
2 25.05.2022 30.05.2022 0
SQL>
With sample data you provided:
SQL> select * from xx_leaves_excludes;
EXCLUDE_ID HOLIDAY_START HOLIDAY_END
---------- --------------- ---------------
1 23.07.2022 20.08.2022
2 01.07.2022 02.08.2022
3 13.07.2022 29.08.2022
4 12.07.2022 01.08.2022
5 01.07.2022 29.08.2022
6 08.07.2022 08.08.2022
7 03.07.2022 20.08.2022
7 rows selected.
SQL> select * from public_holiday;
HOLIDAY_NAME HOLIDAY_DATE
--------------- ---------------
Christmas 10.07.2022
My holiday 12.07.2022
Query looks like this:
SQL> with
2 leaves_calendar as
3 (select l.exclude_id, l.holiday_start + column_value - 1 as datum
4 from xx_leaves_excludesl cross join
5 table(cast(multiset(select level from dual
6 connect by level <= l.holiday_end - l.holiday_start + 1
7 ) as sys.odcinumberlist))
8 )
9 select c.exclude_id,
10 min(c.datum) as holiday_start,
11 max(c.datum) as holiday_end,
12 sum(case when p.holiday_date = c.datum then 1 else 0 end) as excluded_days,
13 listagg(p.holiday_name, ', ') within group (order by p.holiday_date) as excluded
14 from leaves_calendar c left join public_holiday p on p.holiday_date = c.datum
15 group by c.exclude_id;
EXCLUDE_ID HOLIDAY_START HOLIDAY_END EXCLUDED_DAYS EXCLUDED
---------- --------------- --------------- ------------- ----------------------------------------
1 23.07.2022 20.08.2022 0
2 01.07.2022 02.08.2022 2 Christmas, My holiday
3 13.07.2022 29.08.2022 0
4 12.07.2022 01.08.2022 1 My holiday
5 01.07.2022 29.08.2022 2 Christmas, My holiday
6 08.07.2022 08.08.2022 2 Christmas, My holiday
7 03.07.2022 20.08.2022 2 Christmas, My holiday
7 rows selected.
SQL>
Related
This I'm hoping is an easy fix, I have 2 tables, one of days over a 6 month period, the other with sitenames, day(date) and count of attendances that day.
I'm wanting to create a table where for each site, it has a row for every day in the 6 month period and takes that sites count which corresponds to the day, but I also want it to show where there is a NULL (no attendance on that day) - I can do it where it brings out only days with attendance but not the other way around. :(
Example data here: NOTE, the data is held in two temporary tables
Date table #Data
CallDate rn
2022-08-01 1
2022-08-02 2
2022-08-03 3
2022-08-04 4
2022-08-05 5
2022-08-06 6
2022-08-07 7
2022-08-08 8
Attendance table: #SiteData
SiteName CallDate Count
Bassetlaw 2022-08-30 1
Bassetlaw 2022-08-31 1
Bassetlaw 2022-09-13 3
Bassetlaw 2022-09-15 5
Bassetlaw 2022-09-23 1
Bassetlaw 2022-09-27 1
Bassetlaw 2022-11-21 1
Bassetlaw 2022-11-23 1
Bassetlaw 2022-11-26 1
Bassetlaw 2022-11-28 1
So in this instance, I would have 6 months worth of rows, but only 10 days worth of data. I need NULLs for the other days, not just 8 rows.
NOTE: There are more sites, I would want this repeated for all site. In essence, I want a table that has all sites with a row per site per day for 6 months irrespective if they had an attendance or not.
This is done by using the LEFT JOIN command.
See here: http://sqlfiddle.com/#!18/0218c/2
CREATE TABLE T_Data (
rn int,
CallDate date
);
CREATE TABLE T_SiteData (
CallDate date,
Amount int
);
INSERT INTO T_Data SELECT 1, '2022-08-01';
INSERT INTO T_Data SELECT 2, '2022-08-02';
INSERT INTO T_Data SELECT 3, '2022-08-03';
INSERT INTO T_Data SELECT 4, '2022-08-04';
INSERT INTO T_Data SELECT 5, '2022-08-05';
INSERT INTO T_Data SELECT 6, '2022-08-06';
INSERT INTO T_Data SELECT 7, '2022-08-07';
INSERT INTO T_Data SELECT 8, '2022-08-08';
INSERT INTO T_Data SELECT 10, '2022-08-09';
INSERT INTO T_Data SELECT 11, '2022-08-10';
INSERT INTO T_Data SELECT 12, '2022-08-11';
INSERT INTO T_Data SELECT 13, '2022-08-12';
INSERT INTO T_Data SELECT 14, '2022-08-13';
INSERT INTO T_Data SELECT 15, '2022-08-14';
INSERT INTO T_SiteData SELECT '2022-08-01', 1;
INSERT INTO T_SiteData SELECT '2022-08-03', 1;
INSERT INTO T_SiteData SELECT '2022-08-05', 3;
INSERT INTO T_SiteData SELECT '2022-08-12', 5;
SELECT
d.*,
sd.Amount AS [Count]
FROM
T_Data AS d
LEFT JOIN
T_SiteData AS sd
ON
sd.CallDate = d.CallDate
This returns the following:
rn CallDate Count
1 2022-08-01 1
2 2022-08-02 (null)
3 2022-08-03 1
4 2022-08-04 (null)
5 2022-08-05 3
6 2022-08-06 (null)
7 2022-08-07 (null)
8 2022-08-08 (null)
10 2022-08-09 (null)
11 2022-08-10 (null)
12 2022-08-11 (null)
13 2022-08-12 5
14 2022-08-13 (null)
15 2022-08-14 (null)
I used the MS SQL server syntax. But the SELECT query should be the same on almost any other DBMS.
If the table SiteData contains more than one entry per CallDate, the query needs to be adjusted. But your example only showed one entry per CallDate.
Data source
User ID
Visit Date
1
2020-01-01 12:29:15
1
2020-01-02 12:30:11
1
2020-04-01 12:31:01
2
2020-05-01 12:31:14
Problem
I need advice im trying to do sub query for this result to mark user as retention if they havent visit back for 3 month. i using this query for the data to get user's latest visit each month includes null
select u.user_id, gs.yyyymm, s.last_visit_date
from (select distinct user_id from source s) u cross join
generate_series('2021-01-01'::timestamp, '2021-12-01'::timestamp, interval '1 month'
) gs(yyyymm) left join lateral
(select max(s.visit_date) as last_visit_date
from source s
where s.user_id = u.user_id and
s.visit_date >= gs.yyyymm and
s.visit_date < gs.yyyymm + interval '1 month'
) s
on 1=1;
but i think its really affect to performance if user keep increasing, do you guys have any advice to achieve result like below?
Expected Result
Month
User ID
Type
1
1
FIRST
2
1
RETENTION
3
1
RETENTION
4
1
REACTIVATE
....
12
1
null
1
2
null
...
5
2
FIRST
6
2
RETENTION
7
2
RETENTION
8
2
RETENTION
9
2
null
... and so on
or it could be like this
Month
First
Retention
Reactiavate
1
1
0
0
2
0
1
0
3
0
1
0
4
0
0
1
5
1
0
0
6
0
1
0
7
0
1
0
8
0
1
0
9
0
0
0
... and so on
My solution have some reasonable requirements, but I can work without it (the price is some performance).
I built some helper tables what is autofilled with trigger.
The requirement is, that UPDATE or DELETE isn't allowed on visit table.
mst_user table stores distinct user_id-s and it's first_visit.
user_monthly_visit table stores the last and the first visit_date pro user_id and month.
TABLES
CREATE TABLE mst_user (
id BIGINT,
first_visit TIMESTAMP,
CONSTRAINT pk_mst_user PRIMARY KEY (id)
);
CREATE TABLE visit (
user_id BIGINT,
visit_date TIMESTAMP,
CONSTRAINT visit_user_fkey FOREIGN KEY (user_id) REFERENCES mst_user (id)
);
CREATE TABLE user_monthly_visit (
user_id BIGINT,
month DATE,
first_visit_this_month TIMESTAMP,
last_visit_this_month TIMESTAMP,
CONSTRAINT pk_user_monthly_visit PRIMARY KEY (user_id, month),
CONSTRAINT user_monthly_visit_user_fkey FOREIGN KEY (user_id) REFERENCES mst_user (id)
);
CREATE INDEX ix_user_monthly_visit_month ON user_monthly_visit(month);
TRIGGER
CREATE OR REPLACE FUNCTION trf_visit() RETURNS trigger
VOLATILE
AS $xx$
DECLARE
l_user_id BIGINT;
l_row RECORD;
l_user_monthly_visit user_monthly_visit;
BEGIN
IF (tg_op = 'INSERT')
THEN
l_row := NEW;
INSERT INTO mst_user(id, first_visit) VALUES (l_row.user_id, l_row.visit_date)
ON CONFLICT(id) DO UPDATE SET first_visit = LEAST(mst_user.first_visit, l_row.visit_date);
INSERT INTO user_monthly_visit(user_id,month,first_visit_this_month,last_visit_this_month) VALUES (l_row.user_id,date_trunc('month',l_row.visit_date),l_row.visit_date,l_row.visit_date)
ON CONFLICT(user_id,month) DO UPDATE SET first_visit_this_month = LEAST(user_monthly_visit.first_visit_this_month,l_row.visit_date),
last_visit_this_month = GREATEST(user_monthly_visit.last_visit_this_month,l_row.visit_date);
ELSE
RAISE EXCEPTION 'UPDATE and DELETE arent allowed!';
END IF;
RETURN l_row;
END;
$xx$ LANGUAGE plpgsql;
CREATE TRIGGER trig_visit
BEFORE INSERT OR DELETE OR UPDATE ON visit
FOR EACH ROW
EXECUTE PROCEDURE trf_visit();
TESTDATA
INSERT INTO visit (user_id, visit_date)
VALUES (1, '20200101 122915');
INSERT INTO visit (user_id, visit_date)
VALUES (1, '20200102 123011');
INSERT INTO visit (user_id, visit_date)
VALUES (1, '20200401 123101');
INSERT INTO visit (user_id, visit_date)
VALUES (2, '20200501 123114');
QUERY
SELECT mnt AS month, user_id,
CASE WHEN first_visit IS NULL OR first_visit> yyyymm + INTERVAL '1 month' THEN NULL
WHEN first_visit_this_month = first_visit THEN 'FIRST'
WHEN first_visit_this_month IS NULL AND last_three_month + INTERVAL '3 month' >= yyyymm THEN 'RETENTION'
WHEN first_visit_this_month IS NOT NULL THEN 'REACTIVATE'
ELSE NULL
END user_type
FROM
(SELECT date_part('month', gs.yyyymm)::INTEGER AS mnt, gs.yyyymm, u.id user_id, umv.first_visit_this_month, umv.last_visit_this_month, u.first_visit,
GREATEST(
LAG(last_visit_this_month) OVER w,
LAG(last_visit_this_month,2) OVER w,
LAG(last_visit_this_month,3) OVER w
) last_three_month
FROM
generate_series('2020-01-01'::TIMESTAMP, '2020-12-01'::TIMESTAMP, INTERVAL '1 month') gs(yyyymm)
CROSS JOIN mst_user u
LEFT JOIN user_monthly_visit umv on (umv.user_id=u.id AND umv.month = gs.yyyymm)
WINDOW w AS (PARTITION BY u.id ORDER BY gs.yyyymm)
) monthly_visit
ORDER BY user_id,mnt;
RESULT
month
user_id
user_type
1
1
FIRST
2
1
RETENTION
3
1
RETENTION
4
1
REACTIVATE
5
1
RETENTION
6
1
RETENTION
7
1
RETENTION
8
1
(null)
9
1
(null)
10
1
(null)
11
1
(null)
12
1
(null)
1
2
(null)
2
2
(null)
3
2
(null)
4
2
(null)
5
2
FIRST
6
2
RETENTION
7
2
RETENTION
8
2
RETENTION
9
2
(null)
10
2
(null)
11
2
(null)
12
2
(null)
restating your question so it is clearer for me:
for any given month, the user is classified as first, retention or reactivate based on the following criteria
first: month of first visit
retention: within 3 months since previous visit
reactivate: month of visit & no visit in prior month
If I understood this correctly, you can get the first desired result with the following query
Schema (PostgreSQL v13)
create table visits (user_id int, visit_at timestamp);
insert into visits values
(1, '2020-01-01 12:29:15'),
(1, '2020-01-02 12:30:11'),
(1, '2020-04-01 12:31:01'),
(2, '2020-05-01 12:31:14');
Query
WITH trange AS (
SELECT
user_id
, DATE_TRUNC('month', min(visit_at)) visit_from
, DATE_TRUNC('month', max(visit_at)) + interval '3 month' visit_to
FROM visits
GROUP BY 1
)
, monthly_visits AS (
SELECT DISTINCT
user_id
, DATE_TRUNC('month', visit_at) visit_month
FROM visits
)
SELECT
trange.user_id
, DATE(m) report_month
, CASE
WHEN visit_from = m THEN 'FIRST'
WHEN visit_month = m AND LAST_VALUE(visit_month) OVER w IS NULL THEN 'REACTIVATE'
WHEN m <= MAX(visit_month) OVER w + INTERVAL '3 MONTH' THEN 'RETENTION'
ELSE NULL END user_type
FROM trange
LEFT JOIN LATERAL GENERATE_SERIES(visit_from, visit_to, '1 month') m(m)
ON true
LEFT JOIN monthly_visits
ON monthly_visits.user_id = trange.user_id
AND monthly_visits.visit_month = m.m
WINDOW w AS (
PARTITION BY trange.user_id
ORDER BY m.m
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
);
user_id
report_month
user_type
1
2020-01-01T00:00:00.000Z
FIRST
1
2020-02-01T00:00:00.000Z
RETENTION
1
2020-03-01T00:00:00.000Z
RETENTION
1
2020-04-01T00:00:00.000Z
REACTIVATE
1
2020-05-01T00:00:00.000Z
RETENTION
1
2020-06-01T00:00:00.000Z
RETENTION
1
2020-07-01T00:00:00.000Z
RETENTION
2
2020-05-01T00:00:00.000Z
FIRST
2
2020-06-01T00:00:00.000Z
RETENTION
2
2020-07-01T00:00:00.000Z
RETENTION
2
2020-08-01T00:00:00.000Z
RETENTION
View on DB Fiddle
I have a table with the following schema:
CREATE TABLE Codes
(
diagnosis_code CHAR,
visit_date DATE,
visit_id INT,
patient_id int
);
I would like to output the patient_ids where the patient is readmitted (so a different visit_id) with the same diagnosis_code within a certain time (say 15 days). For example, if I have the following entries in the table:
diagnosis_code visit_date visit_id patient_id
-------------- ---------- ----------- -----------
A 2018-01-01 1 1
B 2018-01-01 1 1
A 2018-01-07 2 1
C 2018-01-01 3 2
D 2018-01-01 4 3
D 2018-01-20 5 3
E 2018-01-01 6 4
E 2018-01-01 6 4
A 2018-01-07 7 1
The query would return only patient_id = 1, and the rationales are as follows:
1, because between visit_id 1 and 2, this patient shared diagnosis code A.
Not 2 because this patient was only admitted once.
Not 3 because this patient, although readmitted for the same diagnosis, was not readmitted within 15 days of their initial visit.
Not 4 because this patient has a duplicated diagnosis code in the same visit.
Notice that patient_id = 1 is readmitted for the same diagnosis during visit_id = 7, but he was already counted once before.
You could try a simple join, adding the conditions you described:
select
distinct c.patient_id
from codes c
join codes d on d.patient_id = c.patient_id
and d.visit_id <> c.visit_id
and d.diagnosis_code = c.diagnosis_code
and d.visit_date between c.visit_date
and dateadd(day, 15, c.visit_date)
i used lag.
declare #Codes table
(
diagnosis_code CHAR,
visit_date DATE,
visit_id INT,
patient_id int
);
insert into #Codes
values
('A', '2018-01-01' ,1, 1)
,('B' , '2018-01-01', 1, 1)
,('A' , '2018-01-07', 2, 1)
,('C' ,'2018-01-01', 3, 2)
/*
D 2018-01-01 4 3
D 2018-01-15 5 3
E 2018-01-01 6 4
E 2018-01-01 6 4
A 2018-01-07 7 1
*/
select *
from (
select *
--,rn=row_number() over (partition by patient_ID,diagnosis_code order by visit_date)
,DaysSince = datediff(day,lag(visit_date,1) over (partition by patient_ID,diagnosis_code order by visit_date),visit_date)
from #Codes
) a
where a.DaysSince<=15
You can also use inbuilt FIRST_VALUE and DATEADD functions to achieve this:
SELECT
DISTINCT patient_id,diagnosis_code
FROM
(SELECT
FIRST_VALUE(visit_date) OVER (PARTITION BY patient_id,diagnosis_code ORDER BY visit_id ASC) AS Initial_Visit,
DATEADD(DAY,15,first_value(visit_date) OVER (PARTITION BY patient_id,diagnosis_code ORDER BY visit_id ASC)) Window
,* FROM Codes
)m
WHERE
Initial_Visit <> visit_date
AND visit_date <= Window
I have a table UPCALL_HISTORY that has 3 columns: SUBSCRIBER_ID, START_DATE and END_DATE. Let the number of unique subscribers be N.
I want to create a new table with 3 columns:
SUBSCRIBER_ID: All of the unique subscriber ids repeated 36 times in a row.
MONTHLY_CALENDAR_ID: For each SUBSCRIBER_ID, this column will have dates listed from July 2015 until July 2018 (36 months).
ACTIVE: This column will be used as a flag for each subscriber and whether they have a subscription during that month. This subscription data is in a table called UPCALL_HISTORY.
I am fairly new to SQL, don't have a lot of experience. I am good at Python but it seems that SQL doesn't work like Python.
Any query ideas that could help me build this table?
Let my UPCALL_HISTORY table be:
+---------------+------------+------------+
| SUBSCRIBER_ID | START_DATE | END_DATE |
+---------------+------------+------------+
| 119 | 01/07/2015 | 01/08/2015 |
| 120 | 01/08/2015 | 01/09/2015 |
| 121 | 01/09/2015 | 01/10/2015 |
+---------------+------------+------------+
I want a table that looks like:
+---------------+------------+--------+
| SUBSCRIBER_ID | MON_CA | ACTIVE |
+---------------+------------+--------+
| 119 | 01/07/2015 | 1 |
| * | 01/08/2015 | 0 |
| * | 01/09/2015 | 0 |
| (36 times) | 01/10/2015 | 0 |
| * | * | 0 |
| 119 | 01/07/2018 | 0 |
+---------------+------------+--------+
that continues for 120 and 121
EDIT: Added Example
Here's how I understood the question.
Sample table and several rows:
SQL> create table upcall_history
2 (subscriber_id number,
3 start_date date,
4 end_date date);
Table created.
SQL> insert into upcall_history
2 select 1, date '2015-12-25', date '2016-01-13' from dual union
3 select 1, date '2017-07-10', date '2017-07-11' from dual union
4 select 2, date '2018-01-01', date '2018-04-24' from dual;
3 rows created.
Create a new table. For distinct SUBSCRIBER_ID's, it creates 36 "monthly" rows, fixed (as you stated).
SQL> create table new_table as
2 select
3 x.subscriber_id,
4 add_months(date '2015-07-01', column_value - 1) monthly_calendar_id,
5 0 active
6 from (select distinct subscriber_id from upcall_history) x,
7 table(cast(multiset(select level from dual
8 connect by level <= 36
9 ) as sys.odcinumberlist));
Table created.
Update ACTIVE column value to "1" for rows whose MONTHLY_CALENDAR_ID is contained in START_DATE and END_DATE of the UPCALL_HISTORY table.
SQL> merge into new_table n
2 using (select subscriber_id, start_date, end_date from upcall_history) x
3 on ( n.subscriber_id = x.subscriber_id
4 and n.monthly_calendar_id between trunc(x.start_date, 'mm')
5 and trunc(x.end_date, 'mm')
6 )
7 when matched then
8 update set n.active = 1;
7 rows merged.
SQL>
Result (only ACTIVE = 1):
SQL> select * from new_table
2 where active = 1
3 order by subscriber_id, monthly_calendar_id;
SUBSCRIBER_ID MONTHLY_CA ACTIVE
------------- ---------- ----------
1 2015-12-01 1
1 2016-01-01 1
1 2017-07-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
7 rows selected.
SQL>
If you're on 12c you can use an inline view of all the months with cross apply to get the combinations of those with all IDs:
select uh.subscriber_id, m.month,
case when trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
then 1 else 0 end as active
from upcall_history uh
cross apply (
select add_months(trunc(sysdate, 'MM'), - level) as month
from dual
connect by level <= 36
) m
order by uh.subscriber_id, m.month;
I've made it a rolling 36-months window up to the current month, but you may actually want fixed dates as you had in the question.
With sample data from a CTE:
with upcall_history (subscriber_id, start_date, end_date) as (
select 1, date '2015-09-04', '2015-12-15' from dual
union all select 2, date '2017-12-04', '2018-05-15' from dual
)
that generates 72 rows:
SUBSCRIBER_ID MONTH ACTIVE
------------- ---------- ----------
1 2015-07-01 0
1 2015-08-01 0
1 2015-09-01 1
1 2015-10-01 1
1 2015-11-01 1
1 2015-12-01 0
1 2016-01-01 0
...
2 2017-11-01 0
2 2017-12-01 1
2 2018-01-01 1
2 2018-02-01 1
2 2018-03-01 1
2 2018-04-01 1
2 2018-05-01 0
2 2018-06-01 0
You can use that to create a new table, or populate an existing table; though if you do want a rolling window then a view might be more appropriate.
If you aren't on 12c then cross apply isn't available - you'll get "ORA-00905: missing keyword".
You can get the same result with two CTEs (one to get all the months, the other to get all the IDs) cross-joined, and then outer joined to your actual data:
with m (month) as (
select add_months(trunc(sysdate, 'MM'), - level)
from dual
connect by level <= 36
),
i (subscriber_id) as (
select distinct subscriber_id
from upcall_history
)
select i.subscriber_id, m.month,
case when uh.subscriber_id is null then 0 else 1 end as active
from m
cross join i
left join upcall_history uh
on uh.subscriber_id = i.subscriber_id
and trunc(uh.start_date, 'MM') <= m.month
and (uh.end_date is null or uh.end_date >= add_months(m.month, 1))
order by i.subscriber_id, m.month;
You can do this in 11g using Partitioned Outer Joins, like so:
WITH upcall_history AS (SELECT 119 subscriber_id, to_date('01/07/2015', 'dd/mm/yyyy') start_date, to_date('01/08/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 120 subscriber_id, to_date('01/08/2015', 'dd/mm/yyyy') start_date, to_date('01/09/2015', 'dd/mm/yyyy') end_date FROM dual UNION ALL
SELECT 121 subscriber_id, to_date('01/09/2015', 'dd/mm/yyyy') start_date, to_date('01/10/2015', 'dd/mm/yyyy') end_date FROM dual),
mnths AS (SELECT add_months(TRUNC(SYSDATE, 'mm'), + 1 - LEVEL) mnth
FROM dual
CONNECT BY LEVEL <= 12 * 3 + 1)
SELECT uh.subscriber_id,
m.mnth,
CASE WHEN mnth BETWEEN start_date AND end_date - 1 THEN 1 ELSE 0 END active
FROM mnths m
LEFT OUTER JOIN upcall_history uh PARTITION BY (uh.subscriber_id) ON (1=1)
ORDER BY uh.subscriber_id,
m.mnth;
SUBSCRIBER_ID MNTH ACTIVE
------------- ----------- ----------
119 01/07/2015 1
119 01/08/2015 0
119 01/09/2015 0
119 01/10/2015 0
<snip>
119 01/06/2018 0
119 01/07/2018 0
--
120 01/07/2015 0
120 01/08/2015 1
120 01/09/2015 0
120 01/10/2015 0
<snip>
120 01/06/2018 0
120 01/07/2018 0
--
121 01/07/2015 0
121 01/08/2015 0
121 01/09/2015 1
121 01/10/2015 0
<snip>
121 01/06/2018 0
121 01/07/2018 0
N.B. I have assumed some things about your start/end dates and what constitutes active; hopefully it should be easy enough for you to tweak the case statement to fit the logic that works best for your situation.
I also believe this is can be an example for CROSS JOIN. All I had to do was create a small table of all of the dates and then CROSS JOIN it with the table of subscribers.
Example: https://www.essentialsql.com/cross-join-introduction/
Suppose I have my table like:
uid day_used_app
--- -------------
1 2012-04-28
1 2012-04-29
1 2012-04-30
2 2012-04-29
2 2012-04-30
2 2012-05-01
2 2012-05-21
2 2012-05-22
Suppose I want the number of unique users who returned to the app at least 2 different days in the last 7 days (from 2012-05-03).
So as an example to retrieve the number of users who have used the application on at least 2 different days in the past 7 days:
select count(distinct case when num_different_days_on_app >= 2
then uid else null end) as users_return_2_or_more_days
from (
select uid,
count(distinct day_used_app) as num_different_days_on_app
from table
where day_used_app between current_date() - 7 and current_date()
group by 1
)
This gives me:
users_return_2_or_more_days
---------------------------
2
The question I have is:
What if I want to do this for every day up to now so that my table looks like this, where the second field equals the number of unique users who returned 2 or more different days within a week prior to the date in the first field.
date users_return_2_or_more_days
-------- ---------------------------
2012-04-28 2
2012-04-29 2
2012-04-30 3
2012-05-01 4
2012-05-02 4
2012-05-03 3
Would this help?
WITH
-- your original input, don't use in "real" query ...
input(uid,day_used_app) AS (
SELECT 1,DATE '2012-04-28'
UNION ALL SELECT 1,DATE '2012-04-29'
UNION ALL SELECT 1,DATE '2012-04-30'
UNION ALL SELECT 2,DATE '2012-04-29'
UNION ALL SELECT 2,DATE '2012-04-30'
UNION ALL SELECT 2,DATE '2012-05-01'
UNION ALL SELECT 2,DATE '2012-05-21'
UNION ALL SELECT 2,DATE '2012-05-22'
)
-- end of input, start "real" query here, replace ',' with 'WITH'
,
one_week_b4 AS (
SELECT
uid
, day_used_app
, day_used_app -7 AS day_used_1week_b4
FROM input
)
SELECT
one_week_b4.uid
, one_week_b4.day_used_app
, count(*) AS users_return_2_or_more_days
FROM one_week_b4
JOIN input
ON input.day_used_app BETWEEN one_week_b4.day_used_1week_b4 AND one_week_b4.day_used_app
GROUP BY
one_week_b4.uid
, one_week_b4.day_used_app
HAVING count(*) >= 2
ORDER BY 1;
Output is:
uid|day_used_app|users_return_2_or_more_days
1|2012-04-29 | 3
1|2012-04-30 | 5
2|2012-04-29 | 3
2|2012-04-30 | 5
2|2012-05-01 | 6
2|2012-05-22 | 2
Does that help your needs?
Marco the Sane ...
SELECT DISTINCT
t1.day_used_app,
(
SELECT SUM(CASE WHEN t.num_visits >= 2 THEN 1 ELSE 0 END)
FROM
(
SELECT uid,
COUNT(DISTINCT day_used_app) AS num_visits
FROM table
WHERE day_used_app BETWEEN t1.day_used_app - 7 AND t1.day_used_app
GROUP BY uid
) t
) AS users_return_2_or_more_days
FROM table t1