Longest streak using Standard SQL - sql

I have a table with fields:
user_id
tracking_date
with values
1, 2017-12-23
2, 2017-12-23
1, 2017-12-24
1, 2017-12-25
2, 2017-12-26
3, 2017-12-26
1, 2017-12-27
2, 2017-12-27
I would like to find the longest streak for all users as of today. So o/p of above query comes in form:
1, 1
2, 2
3, 0
Is there a way to achieve this o/p in a single SQL query.

This is tricky. For each user_id, you want to get latest date where there is no record on the previous date and the most recent date:
select user_id,
(case when max(tracking_date) <> current_date then 0
else (current_date -
max(case when prev_td is distinct from tracking_date - interval '1 day'
)
end) as seq
from (select t.*,
lag(tracking_date) over (partition by user_id order by tracking_date) as prev_td
from t
) t
group by user_id;

Related

How can I write BigQuery SQL to group data by start and end date of column changing?

ID|FLAG|TMST
1|1|2022-01-01
1|1|2022-01-02
...(all dates between 01-02 and 02-05 have 1, there are rows for all these dates)
1|1|2022-02-15
1|0|2022-02-16
1|0|2022-02-17
...(all dates between 02-17 and 05-15 have 0, there are rows for all these dates)
1|0|2022-05-15
1|1|2022-05-16
->
ID|FLAG|STRT_MONTH|END_MONTH
1|1|202201|202202
1|0|202203|202204
1|1|202205|999912
I have the first dataset and I am trying to get the second dataset. How can I write bigquery SQL to group by the ID then get the start and end month of when a flag changes? If a specific month has both 0,1 flag like month 202202, I would like to consider that month to be a 1.
You might consider below gaps and islands approach.
WITH sample_table AS (
SELECT 1 id, 1 flag, DATE '2022-01-01' tmst UNION ALL
SELECT 1 id, 1 flag, '2022-01-02' tmst UNION ALL
-- (all dates between 01-02 and 02-05 have 1, there are rows for all these dates)
SELECT 1 id, 1 flag, '2022-02-15' tmst UNION ALL
SELECT 1 id, 0 flag, '2022-02-16' tmst UNION ALL
SELECT 1 id, 0 flag, '2022-02-17' tmst UNION ALL
SELECT 1 id, 0 flag, '2022-03-01' tmst UNION ALL
SELECT 1 id, 0 flag, '2022-04-01' tmst UNION ALL
-- (all dates between 02-17 and 05-15 have 0, there are rows for all these dates)
SELECT 1 id, 0 flag, '2022-05-15' tmst UNION ALL
SELECT 1 id, 1 flag, '2022-05-16' tmst
),
aggregation AS (
SELECT id, DATE_TRUNC(tmst, MONTH) month, IF(SUM(flag) > 0, 1, 0) flag
FROM sample_table
GROUP BY 1, 2
)
SELECT id, ANY_VALUE(flag) flag,
MIN(month) start_month,
IF(MAX(month) = ANY_VALUE(max_month), '9999-12-01', MAX(month)) end_month
FROM (
SELECT * EXCEPT(gap), COUNTIF(gap) OVER w1 AS part FROM (
SELECT *, flag <> LAG(flag) OVER w0 AS gap, MAX(month) OVER w0 AS max_month
FROM aggregation
WINDOW w0 AS (PARTITION BY id ORDER BY month)
) WINDOW w1 AS (PARTITION BY id ORDER BY month)
)
GROUP BY 1, part;
Query results

Use SQL to get monthly churn count and churn rate

Currently using Postgres 9.5
I want to calculate monthly churn_count and churn_rate of the search function.
churn_count: number of users who used the search function last month but not this month
churn_rate: churn_count/total_users_last_month
My dummy data is:
CREATE TABLE yammer_events (
occurred_at TIMESTAMP,
user_id INT,
event_name VARCHAR(50)
);
INSERT INTO yammer_events (occurred_at, user_id, event_name) VALUES
('2014-06-01 00:00:01', 1, 'search_autocomplete'),
('2014-06-01 00:00:01', 2, 'search_autocomplete'),
('2014-07-01 00:00:01', 1, 'search_run'),
('2014-07-01 00:00:02', 1, 'search_run'),
('2014-07-01 00:00:01', 2, 'search_run'),
('2014-07-01 00:00:01', 3, 'search_run'),
('2014-08-01 00:00:01', 1, 'search_run'),
('2014-08-01 00:00:01', 4, 'search_run');
Ideal output should be:
|month |churn_count|churn_rate_percentage|
|--- |--- |--- |
|2014-07-01|0 |0
|2014-08-01|2 |66.6 |
In June: user 1, 2 (2 users)
In July: user 1, 2, 3 (3 users)
In August: user 1, 4 (2 users)
In July, we didn't lose any customer. In August, we lost customer 2 and 3, so the churn_count is 2, and the rate is 2/3*100 = 66.6
I tried the following query to calculate churn_count, but the result is really weird.
WITH monthly_activity AS (
SELECT distinct DATE_TRUNC('month', occurred_at) AS month,
user_id
FROM yammer_events
WHERE event_name LIKE 'search%'
)
SELECT last_month.month+INTERVAL '1 month', COUNT(DISTINCT last_month.user_id)
FROM monthly_activity last_month
LEFT JOIN monthly_activity this_month
ON last_month.user_id = this_month.user_id
AND this_month.month = last_month.month + INTERVAL '1 month'
AND this_month.user_id IS NULL
GROUP BY 1
db<>fiddle
Thank you in advance!
An easy way to do it would be to aggregate the users in an array, and from there extract and count the intersection between the current month and the previous one using the window function LAG(), e.g.
WITH j AS (
SELECT date_trunc('month',occurred_at::date) AS month,
array_agg(distinct user_id) AS users,
count(distinct user_id) AS total_users
FROM yammer_events
GROUP BY 1
ORDER BY 1
)
SELECT month::date,
cardinality(LAG(users) OVER w - users) AS churn_count,
(cardinality(LAG(users) OVER w - users)::numeric /
LAG(total_users) OVER w::numeric) * 100 AS churn_rate_percentage
FROM j
WINDOW w AS (ORDER BY month
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW);
month | churn_count | churn_rate_percentage
------------+-------------+-------------------------
2014-06-01 | |
2014-07-01 | 0 | 0.00000000000000000000
2014-08-01 | 2 | 66.66666666666666666700
(3 rows)
Note: this query relies on the extension intarray. In case you don't have it in your system, just hit:
CREATE EXTENSION intarray;
Demo: db<>fiddle
WITH monthly_activity AS (
SELECT distinct DATE_TRUNC('month', occurred_at) AS month,
user_id
FROM yammer_events
WHERE event_name LIKE 'search%'
)
SELECT
last_month.month+INTERVAL '1 month',
SUM(CASE WHEN this_month.month IS NULL THEN 1 ELSE 0 END) AS churn_count,
SUM(CASE WHEN this_month.month IS NULL THEN 1 ELSE 0 END)*1.00/COUNT(DISTINCT last_month.user_id)*100 AS churn_rate_percentage
FROM monthly_activity last_month
LEFT JOIN monthly_activity this_month
ON last_month.month + INTERVAL '1 month' = this_month.month
AND last_month.user_id = this_month.user_id
GROUP BY 1
ORDER BY 1
LIMIT 2
I think my way is more circuitous but easier for beginners to understand. Just for your reference.

count grouped records by ID and show as weekly with day/week outputted

I have a table in which I have to count total records assigned to each USER by weekly (monday to sunday).
Table BooksIssued
BOOKID USER DATE
1 A 20211001
2 A 20211002
3 A 20211003
4 A 20211004
5 B 20211009
6 C 20211008
7 C 20211008
20211001 is friday.
output of sql query is as follows, the WEEKDATE column shows the week end date (i.e sunday)
WEEKCOUNT USER WEEKDATE
3 A 10/03
1 A 10/10
1 B 10/10
2 C 10/10
I am unable to get the date in output containing day, as grouping is done based on user and week part of date. Please suggest on getting above output.
I am using vertica DB.
Below is sample query i tried (though i could not get the day part of date)
SELECT USER, date_part('WEEK', date)) as WEEKDATE
       SUM(CASE WHEN DATE >= timestampadd(WEEK, DATEDIFF(WEEK, date('1900-01-01 00:00:00.000'), date(sysdate)), date('1900-01-01 00:00:00.000'))
                AND  DATE <  timestampadd(WEEK, DATEDIFF(WEEK, date('1900-01-01 00:00:00.000'), date(sysdate)) + 1, date('1900-01-01 00:00:00.000'))
                THEN 1 ELSE 0 END) AS WEEKCOUNT,
FROM   BOOKSISSUED
GROUP BY USER, date_part('WEEK', date)
when i add date_part('DAY', date) in select clause, i get error as its not in group by.
Please help.
Do you mean this?
WITH
-- your input ...
indata(BOOKID,USR,DT) AS (
SELECT 1,'A',DATE '20211001'
UNION ALL SELECT 2,'A',DATE '20211002'
UNION ALL SELECT 3,'A',DATE '20211003'
UNION ALL SELECT 4,'A',DATE '20211004'
UNION ALL SELECT 5,'B',DATE '20211009'
UNION ALL SELECT 6,'C',DATE '20211008'
UNION ALL SELECT 7,'C',DATE '20211008'
)
SELECT
COUNT(*) AS week_count
, usr
, TO_CHAR(
DATE_TRUNC('WEEK',dt) + INTERVAL '6 DAYS'
, 'MM/DD'
) AS trcweek
FROM indata
GROUP BY 2,3
ORDER BY 2,3
;
week_count | usr | trcweek
------------+-----+---------
3 | A | 10/03
1 | A | 10/10
1 | B | 10/10
2 | C | 10/10
Can you please check the sql query syntax.
In the SELECT clause second column and group by clause second column
SELECT USER, date_part('WEEK', date) as WEEKDATE,
SUM(CASE WHEN DATE >= timestampadd(WEEK, DATEDIFF(WEEK, date('1900-01-01 00:00:00.000'), date(sysdate)), date('1900-01-01 00:00:00.000'))
AND DATE < timestampadd(WEEK, DATEDIFF(WEEK, date('1900-01-01 00:00:00.000'), date(sysdate)) + 1, date('1900-01-01 00:00:00.000'))
THEN 1 ELSE 0 END) AS WEEKCOUNT
FROM BOOKSISSUED
GROUP BY USER, date_part('WEEK', date)

SQL: How to create a weekly user count summary by month

I’m trying to create a week over week active user count summary report/table aggregated by month. I have one table for June 2017 and one table for May 2016 which I need to join together in order to. The date timestamp is created_utc which is a UNIX timestamp which I can figure out to transform into a human-readable format and from there extract the week of the year value so 1 through 52. The questions I have are:
Number the weeks just by values of 1 through 4. So, week 1 for June, Week 1 for May, Week 2 for June week 2 for May and so on.
Joining the tables based by those weeks 1 through 4 values
Pivoting the table and adding a WOW Change variable.
I'd like the final table to look like this:
W
| Week | June_count | May_count |WOW_Change |
|:-----------|:-----------:|:------------:|:----------:
| Week_1 | 5 | 8 | 0.6 |
| Week_2 | 2 | 1 | -0.5 |
| Week_3 | 10 | 5 | -0.5 |
| Week_4 | 30 | 6 | 1 |
Below is some sample data as well as the code I've started.
CREATE TABLE June
(created_utc int, id varchar(6))
;
INSERT INTO June
(created_utc, userid)
VALUES
(1496354167, '6eq4xf'),
(1496362973, '6eqzz3'),
(1496431934, '6ewlm8'),
(1496870877, '6fwied'),
(1496778080, '6fo79k'),
(1496933893, '6g1gcg'),
(1497154559, '6gjkid'),
(1497618561, '6hmeud'),
(1497377349, '6h1osm'),
(1497221017, '6god73'),
(1497731470, '6hvmic'),
(1497273130, '6gs4ay'),
(1498080798, '6ioz8q'),
(1497769316, '6hyer4'),
(1497415729, '6h5cgu'),
(1497978764, '6iffwq')
;
CREATE TABLE May
(created_utc int, id varchar(6))
;
INSERT INTO May
(created_utc, userid)
VALUES
(1493729491, '68sx7k'),
(1493646801, '68m2s2'),
(1493747285, '68uohf'),
(1493664087, '68ntss'),
(1493690759, '68qe5k'),
(1493829196, '691fy9'),
(1493646344, '68m1dv'),
(1494166859, '69rhkl'),
(1493883023, '6963qb'),
(1494362328, '6a83wv'),
(1494525998, '6alv6c'),
(1493945230, '69bkhb'),
(1494050355, '69jqtz'),
(1494418011, '6accd0'),
(1494425781, '6ad0xm'),
(1494024697, '69hx2z'),
(1494586576, '6aql9y')
;
#standardSQL
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM June
SELECT created_utc,
DATE(TIMESTAMP_SECONDS(created_utc)) as event_date,
CAST(EXTRACT(WEEK FROM TIMESTAMP_SECONDS(created_utc)) AS STRING) AS week_number,
COUNT(distinct userid) as user_count
FROM May
Below is for BigQuery Standard SQL
#standardSQL
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.June` AS (
SELECT 1496354167 created_utc, '6eq4xf' userid UNION ALL
SELECT 1496362973, '6eqzz3' UNION ALL
SELECT 1496431934, '6ewlm8' UNION ALL
SELECT 1496870877, '6fwied' UNION ALL
SELECT 1496778080, '6fo79k' UNION ALL
SELECT 1496933893, '6g1gcg' UNION ALL
SELECT 1497154559, '6gjkid' UNION ALL
SELECT 1497618561, '6hmeud' UNION ALL
SELECT 1497377349, '6h1osm' UNION ALL
SELECT 1497221017, '6god73' UNION ALL
SELECT 1497731470, '6hvmic' UNION ALL
SELECT 1497273130, '6gs4ay' UNION ALL
SELECT 1498080798, '6ioz8q' UNION ALL
SELECT 1497769316, '6hyer4' UNION ALL
SELECT 1497415729, '6h5cgu' UNION ALL
SELECT 1497978764, '6iffwq'
), `project.dataset.May` AS (
SELECT 1493729491 created_utc, '68sx7k' userid UNION ALL
SELECT 1493646801, '68m2s2' UNION ALL
SELECT 1493747285, '68uohf' UNION ALL
SELECT 1493664087, '68ntss' UNION ALL
SELECT 1493690759, '68qe5k' UNION ALL
SELECT 1493829196, '691fy9' UNION ALL
SELECT 1493646344, '68m1dv' UNION ALL
SELECT 1494166859, '69rhkl' UNION ALL
SELECT 1493883023, '6963qb' UNION ALL
SELECT 1494362328, '6a83wv' UNION ALL
SELECT 1494525998, '6alv6c' UNION ALL
SELECT 1493945230, '69bkhb' UNION ALL
SELECT 1494050355, '69jqtz' UNION ALL
SELECT 1494418011, '6accd0' UNION ALL
SELECT 1494425781, '6ad0xm' UNION ALL
SELECT 1494024697, '69hx2z' UNION ALL
SELECT 1494586576, '6aql9y'
)
SELECT
CONCAT('Week_', CAST(week AS STRING)) Week,
June.user_count AS June_count,
May.user_count AS May_count,
ROUND((May.user_count - June.user_count) / June.user_count, 2) AS WOW_Change
FROM (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.June`
GROUP BY week
) June
JOIN (
SELECT COUNT(DISTINCT userid) user_count,
DIV(EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1, 7) + 1 week
FROM `project.dataset.May`
GROUP BY week
) May
USING(week)
-- ORDER BY week
with result (as sample data is limited to just first two weeks result is also showing two weeks only which should not be an issue when you apply it to real data)
Row Week June_count May_count WOW_Change
1 Week_1 5 12 1.4
2 Week_2 6 5 -0.17
Use arithmetic on the day of the month to get the week:
SELECT j.weeknumber, j.user_count as june_user_count,
m.user_count as may_user_count
FROM (SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM June
GROUP BY week_number
) j JOIN
(SELECT (EXTRACT(DAY FROM DATE(TIMESTAMP_SECONDS(created_utc))) - 1) / 7 as week_number,
COUNT(distinct userid) as user_count
FROM May
GROUP BY week_number
) m
ON m.week_number = j.week_number;
Note that splitting data into different tables just based on the date is bad idea. The data should all go into one table, perhaps partitioned if data volume is an issue.

Id like to group by number of days (+ or -) and use min date

ID Date Count
1, 2014-05-01 1
1, 2014-05-04 1
1, 2014-05-10 1
2, 2014-05-02 1
2, 2014-05-03 1
2, 2014-05-09 1
if I was to group where the time difference +/- 5 days, this would become
ID Date Count
1, 2014-05-01 2
1, 2014-05-10 1
2, 2014-05-02 2
2, 2014-05-09 1
Is this possible in Sequel Server 2012? Any pointers would be greatly appreciated. Thanks
I think you want to start a new group when there is a gap of five days. So, if you had a record with (1, 2014-05-07), then you would have only one group for 1.
If so, the following will work:
select id, min(date), sum(count)
from (select t.*, sum(HasGap) over (partition by id order by date) as grpid
from (select t.*,
(case when datediff(day,
lag(date) over (partition by id order by date),
date) < 5
then 0 else 1
end) as HasGap
from table t
) t
) t
group by id, grpid;