SQL - Query to return active subscriptions on a given day - sql

I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
userid|purchasedate|expirydate
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
2019-01-01|1
2019-01-02|2
2019-01-03|3
2019-01-04|3

Below is for BigQuery Standard SQL
#standardSQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
You can test, play with above using dummy data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
)
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1

You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
t
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
EDIT:
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.

I make the table name 'test_expirydate' and use your data
and this one work
select
tb1.expirydate,
count(*) as total
from test_expirydate as tb1
left join (
select
expirydate
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.

Related

SQL : create intermediate data from date range

I have a table as shown here:
USER
ROI
DATE
1
5
2021-11-24
1
4
2021-11-26
1
6
2021-11-29
I want to get the ROI for the dates in between the other dates, expected result will be as below
From 2021-11-24 to 2021-11-30
USER
ROI
DATE
1
5
2021-11-24
1
5
2021-11-25
1
4
2021-11-26
1
4
2021-11-27
1
4
2021-11-28
1
6
2021-11-29
1
6
2021-11-30
You may use a calendar table approach here. Create a table containing all dates and then join with it. Sans an actual table, you may use an inline CTE:
WITH dates AS (
SELECT '2021-11-24' AS dt UNION ALL
SELECT '2021-11-25' UNION ALL
SELECT '2021-11-26' UNION ALL
SELECT '2021-11-27' UNION ALL
SELECT '2021-11-28' UNION ALL
SELECT '2021-11-29' UNION ALL
SELECT '2021-11-30'
),
cte AS (
SELECT USER, ROI, DATE, LEAD(DATE) OVER (ORDER BY DATE) AS NEXT_DATE
FROM yourTable
)
SELECT t.USER, t.ROI, d.dt
FROM dates d
INNER JOIN cte t
ON d.dt >= t.DATE AND (d.dt < t.NEXT_DATE OR t.NEXT_DATE IS NULL)
ORDER BY d.dt;

SQL 30 day active user query

I have a table of users and how many events they fired on a given date:
DATE
USERID
EVENTS
2021-08-27
1
5
2021-07-25
1
7
2021-07-23
2
3
2021-07-20
3
9
2021-06-22
1
9
2021-05-05
1
4
2021-05-05
2
2
2021-05-05
3
6
2021-05-05
4
8
2021-05-05
5
1
I want to create a table showing number of active users for each date with active user being defined as someone who has fired an event on the given date or in any of the preceding 30 days.
DATE
ACTIVE_USERS
2021-08-27
1
2021-07-25
3
2021-07-23
2
2021-07-20
2
2021-06-22
1
2021-05-05
5
I tried the following query which returned only the users who were active on the specified date:
SELECT COUNT(DISTINCT USERID), DATE
FROM table
WHERE DATE >= (CURRENT_DATE() - interval '30 days')
GROUP BY 2 ORDER BY 2 DESC;
I also tried using a window function with rows between but seems to end up getting the same result:
SELECT
DATE,
SUM(ACTIVE_USERS) AS ACTIVE_USERS
FROM
(
SELECT
DATE,
CASE
WHEN SUM(EVENTS) OVER (PARTITION BY USERID ORDER BY DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) >= 1 THEN 1
ELSE 0
END AS ACTIVE_USERS
FROM table
)
GROUP BY 1
ORDER BY 1
I'm using SQL:ANSI on Snowflake. Any suggestions would be much appreciated.
This is tricky to do as window functions -- because count(distinct) is not permitted. You can use a self-join:
select t1.date, count(distinct t2.userid)
from table t join
table t2
on t2.date <= t.date and
t2.date > t.date - interval '30 day'
group by t1.date;
However, that can be expensive. One solution is to "unpivot" the data. That is, do an incremental count per user of going "in" and "out" of active states and then do a cumulative sum:
with d as ( -- calculate the dates with "ins" and "outs"
select user, date, +1 as inc
from table
union all
select user, date + interval '30 day', -1 as inc
from table
),
d2 as ( -- accumulate to get the net actives per day
select date, user, sum(inc) as change_on_day,
sum(sum(inc)) over (partition by user order by date) as running_inc
from d
group by date, user
),
d3 as ( -- summarize into active periods
select user, min(date) as start_date, max(date) as end_date
from (select d2.*,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period
from d2
) d2
where running_inc > 0
group by user
)
select d.date, count(d3.user)
from (select distinct date from table) d left join
d3
on d.date >= start_date and d.date < end_date
group by d.date;

Count new entries day by day

I would like to count new id's in each day. Saying new, I mean new relative to the day before.
Assume we have a table:
Date
Id
2021-01-01
1
2021-01-02
4
2021-01-02
5
2021-01-02
6
2021-01-03
1
2021-01-03
5
2021-01-03
7
My desired output, would look like this:
Date
Count(NewId)
2021-01-01
1
2021-01-02
3
2021-01-03
2
You can use two levels of aggregation:
select date, count(*)
from (select id, min(date) as date
from t
group by id
) i
group by date
order by date;
If by "relative to the day before" you mean that you want to count someone as new whenever they have no record on the previous day, then use lag() . . . carefully:
select date,
sum(case when prev_date = date - interval '1' day then 0 else 1 end)
from (select t.*,
lag(date) over (partition by id order by date) as prev_date
from t
) t
group by date
order by date;
here is another way, probably the simplest :
select t1.Date, count(*) from table t1
where id not in (select id from table t2 where t2.date = t1.date- interval '1 day')
group by t1.Date
Maybe this other option could also do the job, but being honest I would prefer the #GordonLinoff answer:
select date, count(*)
from your_table t
where not exists (
select 1
from your_table tt
where tt.Id=t.id
and tt.date = date_sub(t.date,1)
)
group by date

SQL query needed - Counting 365 days backwards

I have searched the forum many times but couldn't find a solution for my situation. I am working with an Oracle database.
I have a table with all Order Numbers and Customer Numbers by Day. It looks like this:
Day | Customer Nbr | Order Nbr
2018-01-05 | 25687459 | 256
2018-01-09 | 36478592 | 398
2018-03-07 | 25687459 | 1547
and so on....
Now I need a SQL Query which gives me a table by day and Customer Nbr and counts the number of unique Order Numbers within the last 365 days starting from column 1.
For the example above the resulting table should look like:
Day | Customer Nbr | Order Cnt
2019-01-01 | 25687459 | 2
2019-01-02 | 25687459 | 2
...
2019-03-01 | 25687459 | 1
One method is to generate values for all days of interest for each customer and then use a correlated subquery:
with dates as (
select date '2019-01-01' + rownum as dte from dual
connect by date '2019-01-01' + rownum < sysdate
)
select d.dte, t.customer_nbr,
(select count(*)
from t t2
where t2.customer_nbr = t.customer_nbr and
t2.day <= t.dte and
t2.date > t.dte - 365
) as order_cnt
from dates d cross join
(select distinct customer_nbr from t) ;
Edit:
I've just seen you clarify the question, which I've interpreted to mean:
For every day in the last year, show how many orders there were for each customer between that date, and 1 year previously. Working on an answer now...
Updated Answer:
For each customer, we count the number of records between the order day, and 365 days before it...
WITH yourTable AS
(
SELECT SYSDATE - 1 Day, 'Alex' CustomerNbr FROM DUAL
UNION ALL
SELECT SYSDATE - 2, 'Alex' FROM DUAL
UNION ALL
SELECT SYSDATE - 366, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 400, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 500, 'Alex'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Joe'FROM DUAL
UNION ALL
SELECT SYSDATE - 300, 'Chris'FROM DUAL
UNION ALL
SELECT SYSDATE - 1, 'Chris'FROM DUAL
)
SELECT Day, CustomerNbr, OrdersLast365Days
FROM yourTable t
OUTER APPLY
(
SELECT COUNT(1) OrdersLast365Days
FROM yourTable t2
WHERE t.CustomerNbr = t2.CustomerNbr
AND TRUNC(t2.Day) >= TRUNC(t.Day) - 364
AND TRUNC(t2.Day) <= TRUNC(t.Day)
)
ORDER BY t.Day DESC, t.CustomerNbr;
If you want to report on just the days you have orders for, then a simple WHERE clause should be enough:
SELECT Day, CustomerNbr, COUNT(1) OrderCount
FROM <yourTable>
WHERE TRUNC(DAY) >= TRUNC(SYSDATE -364)
GROUP BY Day, CustomerNbr
ORDER BY Day Desc;
If you want to report on every day, you'll need to generate them first. This can be done by a recursive CTE, which you then join to your table:
WITH last365Days AS
(
SELECT TRUNC (SYSDATE - ROWNUM + 1) dt
FROM DUAL CONNECT BY ROWNUM < 365
)
SELECT d.Day, COALESCE(t.CustomerNbr, 'None') CustomerNbr, SUM(CASE WHEN t.CustomerNbr IS NULL THEN 0 ELSE 1 END) OrderCount
FROM last365Days d
LEFT OUTER JOIN <yourTable> t
ON d.Day = TRUNC(t.Day)
GROUP BY d.Day, t.CustomerNbr
ORDER BY d.Day Desc;
I would probably have done it with and analytic function. In your windowing clause, you can specify a number of rows before, or a range. In this case I will use a range.
This will give you, For Each customer for each day the number of orders during one rolling year before the date displayed
WITH DATES AS (
SELECT * FROM
(SELECT TRUNC(SYSDATE)-(LEVEL-1) AS DAY FROM DUAL CONNECT BY TRUNC(SYSDATE)-(LEVEL-1) >= ( SELECT MIN(TRUNC(DAY)) FROM MY_TABLE ))
CROSS JOIN
(SELECT DISTINCT CUST_ID FROM MY_TABLE))
SELECT DISTINCT
DATES.DAY,
DATES.CUST_ID,
COUNT(ORDER_ID) OVER (PARTITION BY DATES.CUST_ID ORDER BY DATES.DAY RANGE BETWEEN INTERVAL '1' YEAR PRECEDING AND INTERVAL '1' SECOND PRECEDING)
FROM
DATES
LEFT JOIN
MY_TABLE
ON DATES.DAY=TRUNC(MY_TABLE.DAY) AND DATES.CUST_ID=MY_TABLE.CUST_ID
ORDER BY DATES.CUST_ID,DATES.DAY;

Add Missing monthly dates in a timeseries data in Postgresql

I have monthly time series data in table where dates are as a last day of month. Some of the dates are missing in the data. I want to insert those dates and put zero value for other attributes.
Table is as follows:
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-04-30 34
2 2014-05-31 45
2 2014-08-31 47
I want to convert this table to
id report_date price
1 2015-01-31 40
1 2015-02-28 56
1 2015-03-31 0
1 2015-04-30 34
2 2014-05-31 45
2 2014-06-30 0
2 2014-07-31 0
2 2014-08-31 47
Is there any way we can do this in Postgresql?
Currently we are doing this in Python. As our data is growing day by day and its not efficient to handle I/O just for one task.
Thank you
You can do this using generate_series() to generate the dates and then left join to bring in the values:
with m as (
select id, min(report_date) as minrd, max(report_date) as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) as report_date
from m
) m left join
t
on m.report_date = t.report_date;
EDIT:
Turns out that the above doesn't quite work, because adding months to the end of month doesn't keep the last day of the month.
This is easily fixed:
with t as (
select 1 as id, date '2012-01-31' as report_date, 10 as price union all
select 1 as id, date '2012-04-30', 20
), m as (
select id, min(report_date) - interval '1 day' as minrd, max(report_date) - interval '1 day' as maxrd
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select m.*, generate_series(minrd, maxrd, interval '1' month) + interval '1 day' as report_date
from m
) m left join
t
on m.report_date = t.report_date;
The first CTE is just to generate sample data.
This is a slight improvement over Gordon's query which fails to get the last date of a month in some cases.
Essentially you generate all the month end dates between the min and max date for each id (using generate_series) and left join on this generated table to show the missing dates with 0 price.
with minmax as (
select id, min(report_date) as mindt, max(report_date) as maxdt
from t
group by id
)
select m.id, m.report_date, coalesce(t.price, 0) as price
from (select *,
generate_series(date_trunc('MONTH',mindt+interval '1' day),
date_trunc('MONTH',maxdt+interval '1' day),
interval '1' month) - interval '1 day' as report_date
from minmax
) m
left join t on m.report_date = t.report_date
Sample Demo