How to return id users buy several months consecutive? - sql

How can I get all user_id values from the data below, for all rows containing the same user_id value over consecutive months from a given start date in the date column.
For example, given the below table....
date
user_id
2018-11-01
13
2018-11-01
13
2018-11-01
14
2018-11-01
15
2018-12-01
13
2019-01-01
13
2019-01-01
14
...supposing I want to get the user_id values for consecutive months prior to (but not including) 2019-01-01 then I'd have this as my output:
user_id
m_year
13
2018-11
13
2018-12
13
2019-01
probably can be applied windows function

If you want to aggregate on a user and the year-months
select
t.user_id,
to_char(date_trunc('month',t.date),'YYYY-MM') as m_year
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
order by t.user_id, m_year
But if you only want those with consecutive months, then a little extra is needed.
select
user_id,
to_char(ym,'YYYY-MM') as m_year
from
(
select t.user_id
, date_trunc('month',t.date) as ym
, lag(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as prev_ym
, lead(date_trunc('month',t.date))
over (partition by t.user_id order by date_trunc('month',t.date)) as next_ym
from yourtable t
where t.date < '2019-02-01'::date
group by t.user_id, date_trunc('month',t.date)
) q
where (ym - prev_ym <= '31 days'::interval or
next_ym - ym <= '31 days'::interval)
order by user_id, ym
user_id | m_year
------: | :------
13 | 2018-11
13 | 2018-12
13 | 2019-01
db<>fiddle here

you don't need a window function in this specific query. Just try :
SELECT DISTINCT ON (user_id) user_id, date_trunc('month', date :: date) AS m_year
FROM your_table

Related

How to create a start and end date with no gaps from one date column and to sum a value within the dates

I am new SQL coding using in SQL developer.
I have a table that has 4 columns: Patient ID (ptid), service date (dt), insurance payment amount (insr_amt), out of pocket payment amount (op_amt). (see table 1 below)
What I would like to do is (1) create two columns "start_dt" and "end_dt" using the "dt" column where if there are no gaps in the date by the patient ID then populate the start and end date with the first and last date by patient ID, however if there is a gap in service date within the patient ID then to create the separate start and end date rows per patient ID, along with (2) summing the two payment amounts by patient ID with in the one set of start and end date visits (see table 2 below).
What would be the way to run this using SQL code in SQL developer?
Thank you!
Table 1:
Ptid
dt
insr_amt
op_amt
A
1/1/2021
30
20
A
1/2/2021
30
10
A
1/3/2021
30
10
A
1/4/2021
30
30
B
1/6/2021
10
10
B
1/7/2021
20
10
C
2/1/2021
15
30
C
2/2/2021
15
30
C
2/6/2021
60
30
Table 2:
Ptid
start_dt
end_dt
total_insr_amt
total_op_amt
A
1/1/2021
1/4/2021
120
70
B
1/6/2021
1/7/2021
30
20
C
2/1/2021
2/2/2021
30
60
C
2/6/2021
2/6/2021
60
30
You didn't mention the specific database so this solution works in PostgreSQL. You can do:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select *,
sum(inc) over(partition by ptid order by dt) as grp
from (
select *,
case when dt - interval '1 day' = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
Result:
ptid start_dt end_dt total_insr_amt total_op_amt
----- ---------- ---------- -------------- -----------
A 2021-01-01 2021-01-04 120 70
B 2021-01-06 2021-01-07 30 20
C 2021-02-01 2021-02-02 30 60
C 2021-02-06 2021-02-06 60 30
See running example at DB Fiddle 1.
EDIT for Oracle
As requested, the modified query that works in Oracle is:
select
ptid,
min(dt) as start_dt,
max(dt) as end_dt,
sum(insr_amt) as total_insr_amt,
sum(op_amt) as total_op_amt
from (
select x.*,
sum(inc) over(partition by ptid order by dt) as grp
from (
select t.*,
case when dt - 1 = lag(dt) over(partition by ptid order by dt)
then 0 else 1 end as inc
from t
) x
) y
group by ptid, grp
order by ptid, grp
See running example at db<>fiddle 2.

Populating grid date from based on timestamps

I have two following tables that I want to join:
select
user_id
, date
from users
where id = 13
order by date descending
user_id
date
13
2020-06-31
13
2020-06-30
13
2020-06-29
13
2020-06-28
13
2020-06-27
13
2020-06-26
select
user_id
, date
, orders_count_sum
from orders
where user_id = 13
order by date descending
user_id
date
orders_count_sum
13
2020-06-30
3
13
2020-06-27
2
13
2020-06-26
1
I want to join the orders table to users, so that I get orders_count_sum populated over dates:
userid
date
orders_count_sum
13
2020-06-31
3
13
2020-06-30
3
13
2020-06-29
2
13
2020-06-28
2
13
2020-06-27
2
13
2020-06-26
1
Doing a left join here will only show order_count_sum for the dates from the second table. How can I populate the users tabls grid date with latest for that date orders_count_sum?
You can use a lead() and then join
select u.user_id, u.date, o.orders_count_sum
from users u join
(select o.*,
lead(date) over (partition by o.user_id order by date) as next_date
from orders o
) o
on o.user_id = u.user_id and
u.date >= o.date and
(u.date < o.next_date or o.next_date is null);
You can filter down to a particular user if you also want that.
Consider below approach
select user_id, date,
last_value(orders_count_sum ignore nulls) over win orders_count_sum
from users u
left join orders o
using (user_id, date)
window win as (partition by user_id order by date rows between unbounded preceding and current row)
If applied to sample data in your question - output is
As you might noticed - in your sample data I changed MONTH from 06 to 05 as there is no such a date as JUNE 31 unless you are in the word of Priestley's fantasy Book

SQL - Query to return active subscriptions on a given day

I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
userid|purchasedate|expirydate
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
2019-01-01|1
2019-01-02|2
2019-01-03|3
2019-01-04|3
Below is for BigQuery Standard SQL
#standardSQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
You can test, play with above using dummy data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
)
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1
You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
t
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
EDIT:
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.
I make the table name 'test_expirydate' and use your data
and this one work
select
tb1.expirydate,
count(*) as total
from test_expirydate as tb1
left join (
select
expirydate
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.

how do i divide and add column

i have a list with peoples id and date, the list say when a person Entered to website (his id and date).
how can i show for all the dates how many people enter the site two days in a row?
the data ( 30,000 like this in diffrent dates)
01/03/2019 4616
01/03/2019 17584
01/03/2019 7812
01/03/2019 34
01/03/2019 12177
01/03/2019 7129
01/03/2019 11660
01/03/2019 2428
01/03/2019 17514
01/03/2019 10781
01/03/2019 7629
01/03/2019 11119
I succeeded to show the amount of pepole enter the site on the same day but i didnt succeeded to add a column that show the pepole that enter 2 days in row.
date number_of_entrance
2019-03-01 7099
2019-03-02 7021
2019-03-03 7195
2019-03-04 7151
2019-03-05 7260
2019-03-06 7169
2019-03-07 7076
2019-03-08 7081
2019-03-09 6987
2019-03-10 7172
select date,count(*) as number_of_entrance
fROM [finalaa].[dbo].[Daily_Activity]
group by Date
order by date;
how can i show for all the dates how many people enter the site two days in a row?
I would just use lag():
select count(distinct person)
from (select t.*,
lag(date) over (partition by person order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date);
Your code suggests SQL Server, so I used the date functions in that database.
If you want this per date:
select date, count(distinct person)
from (select t.*,
lag(date) over (partition by person order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;
You can use a subquery which returns the number of common entrances in 2 days:
select
t.date,
count(*) as number_of_entrance,
(
SELECT COUNT(g.id) FROM (
SELECT id
FROM [Daily_Activity]
WHERE date IN (t.date, t.date - 1)
GROUP BY id
HAVING COUNT(DISTINCT date) = 2
) g
) number_of_entrance_2_days_in_a_row
FROM [Daily_Activity] t
group by t.date
order by t.date;
Replace id with the 2nd column's name in the table.

Cumulative sum of values by month, filling in for missing months

I have this data table and I'm wondering if is possible create a query that get a cumulative sum by month considering all months until the current month.
date_added | qty
------------------------------------
2015-08-04 22:28:24.633784-03 | 1
2015-05-20 20:22:29.458541-03 | 1
2015-04-08 14:16:09.844229-03 | 1
2015-04-07 23:10:42.325081-03 | 1
2015-07-06 18:50:30.164932-03 | 1
2015-08-22 15:01:54.03697-03 | 1
2015-08-06 18:25:07.57763-03 | 1
2015-04-07 23:12:20.850783-03 | 1
2015-07-23 17:45:29.456034-03 | 1
2015-04-28 20:12:48.110922-03 | 1
2015-04-28 13:26:04.770365-03 | 1
2015-05-19 13:30:08.186289-03 | 1
2015-08-06 18:26:46.448608-03 | 1
2015-08-27 16:43:06.561005-03 | 1
2015-08-07 12:15:29.242067-03 | 1
I need a result like that:
Jan|0
Feb|0
Mar|0
Apr|5
May|7
Jun|7
Jul|9
Aug|15
This is very similar to other questions, but the best query is still tricky.
Basic query to get the running sum quickly:
SELECT to_char(date_trunc('month', date_added), 'Mon YYYY') AS mon_text
, sum(sum(qty)) OVER (ORDER BY date_trunc('month', date_added)) AS running_sum
FROM tbl
GROUP BY date_trunc('month', date_added)
ORDER BY date_trunc('month', date_added);
The tricky part is to fill in for missing months:
WITH cte AS (
SELECT date_trunc('month', date_added) AS mon, sum(qty) AS mon_sum
FROM tbl
GROUP BY 1
)
SELECT to_char(mon, 'Mon YYYY') AS mon_text
, sum(c.mon_sum) OVER (ORDER BY mon) AS running_sum
FROM (SELECT min(mon) AS min_mon FROM cte) init
, generate_series(init.min_mon, now(), interval '1 month') mon
LEFT JOIN cte c USING (mon)
ORDER BY mon;
The implicit CROSS JOIN LATERAL requires Postgres 9.3+. This starts with the first month in the table.
To start with a given month:
WITH cte AS (
SELECT date_trunc('month', date_added) AS mon, sum(qty) AS mon_sum
FROM tbl
GROUP BY 1
)
SELECT to_char(mon, 'Mon YYYY') AS mon_text
, COALESCE(sum(c.mon_sum) OVER (ORDER BY mon), 0) AS running_sum
FROM generate_series('2015-01-01'::date, now(), interval '1 month') mon
LEFT JOIN cte c USING (mon)
ORDER BY mon;
db<>fiddle here
Old sqlfiddle
Keeping months from different years apart. You did not ask for that, but you'll most likely want it.
Note that the "month" to some degree depends on the time zone setting of the current session! Details:
Ignoring time zones altogether in Rails and PostgreSQL
Related:
Calculating Cumulative Sum in PostgreSQL
PostgreSQL: running count of rows for a query 'by minute'
Postgres window function and group by exception