Count new entries day by day - sql

I would like to count new id's in each day. Saying new, I mean new relative to the day before.
Assume we have a table:
Date
Id
2021-01-01
1
2021-01-02
4
2021-01-02
5
2021-01-02
6
2021-01-03
1
2021-01-03
5
2021-01-03
7
My desired output, would look like this:
Date
Count(NewId)
2021-01-01
1
2021-01-02
3
2021-01-03
2

You can use two levels of aggregation:
select date, count(*)
from (select id, min(date) as date
from t
group by id
) i
group by date
order by date;
If by "relative to the day before" you mean that you want to count someone as new whenever they have no record on the previous day, then use lag() . . . carefully:
select date,
sum(case when prev_date = date - interval '1' day then 0 else 1 end)
from (select t.*,
lag(date) over (partition by id order by date) as prev_date
from t
) t
group by date
order by date;

here is another way, probably the simplest :
select t1.Date, count(*) from table t1
where id not in (select id from table t2 where t2.date = t1.date- interval '1 day')
group by t1.Date

Maybe this other option could also do the job, but being honest I would prefer the #GordonLinoff answer:
select date, count(*)
from your_table t
where not exists (
select 1
from your_table tt
where tt.Id=t.id
and tt.date = date_sub(t.date,1)
)
group by date

Related

SQL 30 day active user query

I have a table of users and how many events they fired on a given date:
DATE
USERID
EVENTS
2021-08-27
1
5
2021-07-25
1
7
2021-07-23
2
3
2021-07-20
3
9
2021-06-22
1
9
2021-05-05
1
4
2021-05-05
2
2
2021-05-05
3
6
2021-05-05
4
8
2021-05-05
5
1
I want to create a table showing number of active users for each date with active user being defined as someone who has fired an event on the given date or in any of the preceding 30 days.
DATE
ACTIVE_USERS
2021-08-27
1
2021-07-25
3
2021-07-23
2
2021-07-20
2
2021-06-22
1
2021-05-05
5
I tried the following query which returned only the users who were active on the specified date:
SELECT COUNT(DISTINCT USERID), DATE
FROM table
WHERE DATE >= (CURRENT_DATE() - interval '30 days')
GROUP BY 2 ORDER BY 2 DESC;
I also tried using a window function with rows between but seems to end up getting the same result:
SELECT
DATE,
SUM(ACTIVE_USERS) AS ACTIVE_USERS
FROM
(
SELECT
DATE,
CASE
WHEN SUM(EVENTS) OVER (PARTITION BY USERID ORDER BY DATE ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) >= 1 THEN 1
ELSE 0
END AS ACTIVE_USERS
FROM table
)
GROUP BY 1
ORDER BY 1
I'm using SQL:ANSI on Snowflake. Any suggestions would be much appreciated.
This is tricky to do as window functions -- because count(distinct) is not permitted. You can use a self-join:
select t1.date, count(distinct t2.userid)
from table t join
table t2
on t2.date <= t.date and
t2.date > t.date - interval '30 day'
group by t1.date;
However, that can be expensive. One solution is to "unpivot" the data. That is, do an incremental count per user of going "in" and "out" of active states and then do a cumulative sum:
with d as ( -- calculate the dates with "ins" and "outs"
select user, date, +1 as inc
from table
union all
select user, date + interval '30 day', -1 as inc
from table
),
d2 as ( -- accumulate to get the net actives per day
select date, user, sum(inc) as change_on_day,
sum(sum(inc)) over (partition by user order by date) as running_inc
from d
group by date, user
),
d3 as ( -- summarize into active periods
select user, min(date) as start_date, max(date) as end_date
from (select d2.*,
sum(case when running_inc = 0 then 1 else 0 end) over (partition by user order by date) as active_period
from d2
) d2
where running_inc > 0
group by user
)
select d.date, count(d3.user)
from (select distinct date from table) d left join
d3
on d.date >= start_date and d.date < end_date
group by d.date;

How to cross join but using latest value in BIGQUERY

I have this table below
date
id
value
2021-01-01
1
3
2021-01-04
1
5
2021-01-05
1
10
And I expect output like this, where the date column is always increase daily and value column will generate the last value on an id
date
id
value
2021-01-01
1
3
2021-01-02
1
3
2021-01-03
1
3
2021-01-04
1
5
2021-01-05
1
10
2021-01-06
1
10
I think I can use cross join but I can't get my expected output and think that there are a special syntax/logic to solve this
Consider below approach
select * from `project.dataset.table`
union all
select missing_date, prev_row.id, prev_row.value
from (
select *, lag(t) over(partition by id order by date) prev_row
from `project.dataset.table` t
), unnest(generate_date_array(prev_row.date + 1, date - 1)) missing_date
I would write this using:
select dte, t.id, t.value
from (select t.*,
lead(date, 1, date '2021-01-06') over (partition by id order by date) as next_day
from `table` t
) t cross join
unnest(generate_date_array(
date,
ifnull(
date_add(next_date, interval -1 day), -- generate missing date rows
(select max(date) from `table`) -- add last row
)
)) dte;
Note that this requires neither union all nor window function to fill in the values.
alternative solution using last_value. You may explore the following query and customize your logic to generate days (if needed)
WITH
query AS (
SELECT
date,
id,
value
FROM
`mydataset.newtable`
ORDER BY
date ),
generated_days AS (
SELECT
day
FROM (
SELECT
MIN(date) min_dt,
MAX(date) max_dt
FROM
query),
UNNEST(GENERATE_DATE_ARRAY(min_dt, max_dt)) day )
SELECT
g.day,
LAST_VALUE(q.id IGNORE NULLS) OVER(ORDER BY g.day) id,
LAST_VALUE(q.value IGNORE NULLS) OVER(ORDER BY g.day) value,
FROM
generated_days g
LEFT OUTER JOIN
query q
ON
g.day = q.date
ORDER BY
g.day

SQL - Query to return active subscriptions on a given day

I have a table that shows when a user signs up for a subscription and when their membership will expire. A user can purchase a new subscription even if their current one is in force.
userid|purchasedate|expirydate
1 |2019-01-01 |2019-02-01
2 |2019-01-02 |2019-02-02
3 |2019-01-03 |2019-02-03
3 |2019-01-04 |2019-03-03
I need a SQL query that will GROUP BY the date and return the number of active subscriptions on that date. So it would return:
date |count
2019-01-01|1
2019-01-02|2
2019-01-03|3
2019-01-04|3
Below is for BigQuery Standard SQL
#standardSQL
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
You can test, play with above using dummy data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 userid, DATE '2019-01-01' purchasedate, DATE '2019-02-01' expirydate UNION ALL
SELECT 2, '2019-01-02', '2019-02-02' UNION ALL
SELECT 3, '2019-01-03', '2019-02-03' UNION ALL
SELECT 3, '2019-01-04', '2019-03-03'
)
SELECT day, COUNT(DISTINCT userid) active_subscriptions
FROM (SELECT AS STRUCT MIN(purchasedate) min_date, MAX(expirydate) max_date FROM `project.dataset.table`),
UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
JOIN `project.dataset.table`
ON day BETWEEN purchasedate AND expirydate
GROUP BY day
with below output
Row day active_subscriptions
1 2019-01-01 1
2 2019-01-02 2
3 2019-01-03 3
4 2019-01-04 3
5 2019-01-05 3
6 2019-01-06 3
... ... ...
... ... ...
31 2019-01-31 3
32 2019-02-01 3
33 2019-02-02 2
34 2019-02-03 1
35 2019-02-04 1
... ... ...
... ... ...
61 2019-03-02 1
62 2019-03-03 1
You need a list of dates and count(distinct):
select d.dte, count(distinct t.userid) as num_users
from (select distinct purchase_date as dte from t) d left join
t
on d.dte >= t.dte and
d.dte <= t.expiry_date
group by d.dte
order by d.dte;
EDIT:
BigQuery can be fickle about inequalities in the on clause. Here is another approach:
select dte, count(distinct t.userid) as num_users
from t cross join
unnest(generate_date_array(t.purchase_date, t.expiry_date, interval 1 day)) dte
group by dte
order by dte;
You can use a where clause to filter down to particular dates.
I make the table name 'test_expirydate' and use your data
and this one work
select
tb1.expirydate,
count(*) as total
from test_expirydate as tb1
left join (
select
expirydate
from test_expirydate as tb2
group by userid
) as tb2
on tb1.expirydate >= tb2.expirydate
group by tb1.expirydate
I don't sure is it work in other case or not but it fine with current data
Oh, I interpret that the left column should be the expiration date.

sql Select Earliest Date Multiple Rows

I have the following data:
id from_date to_date empty
1 24/03/2016 01/04/2016 Y
1 01/04/2016 23/06/2016 Y
1 05/08/2016 01/04/2017 Y
1 01/04/2017 01/04/2018 Y
1 01/04/2018 01/04/2019 Y
The current date falls between 01/04/2018 and 01/04/2019 however, the earliest consecutive date is 05/08/2016. How can I write an sql script to pick up the earliest from date for the period that includes today.
Is this possible without creating a temporary table and updating the from date for each id? where the from_date = to_date for the previous row.
Hope that all makes sense.
Thanks
Iain
You seem to want to group the values together. Here is one method to get the periods of the continuous dates:
select id, min(from_date), max(to_date)
from (select t.*,
sum(case when prev_to_date = to_date then 1 else 0 end) over (partition by id) as grp
from (select t.*,
lag(to_date) over (partition by id order by from_date) as prev_to_date
from t
) t
) t
group by id, grp;
For filtering, you can add:
having current_date >= min(from_date) and current_date <= max(to_date)

Retrieve rows for time interval but also previous row of each - how to?

I have a table like this:
Id FKId Amount1 Amount2 Date
-----------------------------------------------------
1 1 100,0000 33,0000 2018-01-18 19:57:39.403
2 2 50,0000 10,0000 2018-01-19 19:57:57.097
3 1 130,0000 40,0000 2018-01-20 19:58:13.660
5 2 44,0000 2,0000 2018-01-21 11:11:00.000
How to get rows from 3 - 5 (all that have dates 2018-01-21 or 2018-01-21) but also their previous row regarding FKId (1 and 2)?
Thank you
In most databases, you can use the ANSI standard lead() function:
select t.*
from (select t.*, lead(date) over (partition by fkid order by date) as next_date
from t
) t
where date in ('2018-01-20', '2018-01-21') or
next_date in ('2018-01-20', '2018-01-21');
Alternatively, if you just want all records where the date is bigger than some date and the previous record, this logic also works:
select t.*
from t
where t.date >= (select max(t2.date)
from t t2
where t2.fkid = t.fkid and t2.date < '2018-01-20'
);