Calculate weekly retention in google big query - sql

I have a big table in google big query and there are two columns on which I want to perform retention:-
Date user
2021-02-03 08:35:07 UTC foo#abc.com
2021-02-03 08:35:07 UTC foo1#abc.com
2021-02-04 08:35:07 UTC foo2#abc.com
2021-02-05 08:35:07 UTC foo#abc.com
2021-02-03 08:35:07 UTC foo1#abc.com
2021-02-10 08:35:07 UTC foo#abc.com
2021-02-13 08:35:07 UTC foo1#abc.com
2021-02-18 08:35:07 UTC foo3#abc.com
2021-02-21 08:35:07 UTC foo2#abc.com
2021-02-23 08:35:07 UTC foo2#abc.com
2021-02-24 08:35:07 UTC foo5#abc.com
2021-02-24 08:35:07 UTC foo2#abc.com
I want to calculate retention on the below condition:-
percentage of unique users for week1 present in week2
percentage of unique users from week2 present in week3 and so on.
The desired out format will be:-
week2 week3 week4
23% 56% 33%
I want to perform this on a time frame like one month or 6 months and whatever timeframe I choose the output should be in the above format.
I want a solution for Big Query but even a MySQL solution will help me.

Here is a possible solution:
WITH leads AS (
SELECT
user,
EXTRACT(ISOWEEK
FROM
`Date`) AS visit_week,
EXTRACT(ISOWEEK
FROM
LEAD(`Date`) OVER (PARTITION BY user ORDER BY `Date`)) AS next_visit_week
-- here you look the user's next visit and take the week. If the user is there the following week, next_visit_week = visit_week + 1
FROM
`your_project`.`your_dataset`.`your_table`)
SELECT
visit_week+1 AS `week`,
SUM(CASE
WHEN visit_week= next_visit_week-1
THEN 1
ELSE 0
END
)/COUNT(DISTINCT user)*100 AS retention_pct
FROM
leads
GROUP BY
`visit_week`
For each week, you count the number of times the next visit of a user occurs the week following the current week (NB: it can only occur once for each user). You divide the total by the number of distinct users.
You therefore obtain the retention rate for the following week (hence the '+1' in the "visit_week+1 AS week").

Related

how to filter aggregate query ordered by date

I need to retrieve aggregate information (by means of SUM) on the basis of two separate dates - created_at and erased_at.
The requirement is as follows:
Column
Description
Month
Month as YYYY-MM
Active_users
Total number of unique users in the Month. The users have not been erased during or before the Month.
An example set of data in a "users" table is as below:
id created_at erased_at
20 2017-08-17 08:04 2020-08-04 10:17
27 2017-09-08 13:21 2020-08-04 10:15
31 2017-09-10 11:03 2020-10-01 15:42
61 2017-09-19 10:51 2020-08-17 15:13
71 2017-09-20 06:44 2020-08-04 10:15
80 2017-09-20 10:52 2020-08-04 10:17
217 2017-10-10 06:24 2020-08-04 10:16
247 2017-10-11 14:22 2020-08-04 10:15
249 2017-10-11 22:14 2020-08-04 10:15
256 2017-10-12 11:31 2020-08-04 10:17
428 2017-11-02 13:13 2020-10-01 15:15
649 2017-12-11 11:21 2020-10-01 15:16
651 2017-12-11 11:56 2020-08-04 10:15
810 2018-02-06 09:09 2021-07-29 09:03
811 2018-02-06 09:10 2021-07-29 09:03
833 2018-02-09 14:25 2020-08-04 10:16
968 2018-03-17 04:55 2020-10-15 06:08
The particular monthly sums of users created_at in a given month can be made by means of:
SELECT
to_char(users.created_at, 'YYYY MM') AS Month,
count(users.id) AS Created_users,
FROM users
GROUP BY Month
ORDER BY Month DESC
(The same may be done for erased_at, of course.)
However, how do I formulate a query to present the number of active users within any month, who have already been created (not necessarily within the particular month) and who have not been erased at the time?
I've tried various subqueries and joins and I'm obviously beyond my paygrade on it and I do understand this probably is a student-level kind of question.
Please, help.
(I'm on Postgresql 9.6 in case it needs to be that advanced.)
One method generates the months and then uses JOIN and GROUP BY:
select gs.yyyymm, count(u.id)
from generate_series('2018-01-01'::date, '2018-12-01'::date, interval '1 month') gs(yyyymm) left join
users u
on u.created_at < gs.yyyymm + interval '1 month' and
(u.erased_at >= gs.yyymm or u.erased_at is null)
group by gs.yyyymm;
Note: This is users active at any point during the month, which is how I interpret your question. You can tweak the query if you really mean:
Users active for the entire month.
Users active on the first day of the month.
Users active on the last day of the month.
EDIT:
You can use the dates in the data using:
select gs.yyyymm, count(u.id)
from (select yyyymm
from (select min(erased_at) as min_ea, max(erased_at) as max_ea
from users
) u cross join lateral
generate_series(date_trunc('month', min_ea), date_trunc('month', max_ea), interval '1 month'
) gs left join
users u
on u.created_at < gs.yyyymm + interval '1 month' and
(u.erased_at >= gs.yyymm or u.erased_at is null)
group by gs.yyyymm;

Google Bigquery - Create time series of number of active records

I'm trying to create a timeseries in google bigquery SQL. My data is a series of time ranges covering the period of activity for that record. Here is an example:
Start End
2020-11-01 21:04:00 UTC 2020-11-02 07:15:00 UTC
2020-11-01 21:45:00 UTC 2020-11-02 04:00:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 09:48:00 UTC
2020-11-01 22:00:00 UTC 2020-11-02 06:00:00 UTC
I wish to create a new table to total the number of active records within a 15 minute block. "21:00:00" would for example be 21:00 to 21:14.59. My desired output for the above would be:
Period Active_Records
2020-11-01 21:00:00 1
2020-11-01 21:15:00 1
2020-11-01 21:30:00 1
2020-11-01 21:45:00 2
2020-11-01 22:00:00 4
2020-11-01 22:15:00 4
etc until the end of the last active range.
I would also like to be able to generate this on the fly by querying a date range and having it return every 15 minute block in the range and how many active records there was in that period.
Any assistance would be greatly appreciated.
Below is for BigQuery Standard SQL
#standardSQL
select ts as period, count(1) as Active_Records
from unnest((
select generate_timestamp_array(timestamp_trunc(min(start), hour), max(`end`), interval 15 minute)
from `project.dataset.table`
)) ts
join `project.dataset.table`
on not (`end` < ts or start > timestamp_add(ts, interval 15 * 60 - 1 second))
group by ts
if to apply to sample data from your question - output is

How to split 24 Hour time by 10 minutes

I am trying to make a graph on BI which needs me to make so the time variable in my database groups for each 10 minutes IE - 11:04:00, 11:08:30, 11:00:28 are grouped as 11-1 then 11-2, ... ,11-6, 12-1 so on. -
06:00:00 06-1
06:03:00
06:06:00
06:09:00
06:12:00 06-2
06:15:00
06:18:00
06:21:00 06-2
06:24:00
06:27:00
06:30:00 06-3
06:33:00
06:36:00
06:39:00
06:42:00 06-4
06:45:00
06:48:00
06:51:00 06-5
06:54:00
06:57:00
07:00:00 07-1
07:03:00
07:06:00
07:09:00
07:12:00 07-2
Is there anyway I can do this on the BI?
Thank you for helping.
This will give you 10-minute block groupings, starting with 0:
=FormatDate([Date];"HH")+"-"+Floor(ToNumber(FormatDate([Date];"mm"))/10)

SQL Server : get start time and end time with in the multiple night shift

How can I get the Start and End time of this list? I can add date to this time and can get by min and max but you can see row 3 have next day shift but it will come under same date because it is night shift
I have added normal day shift employee also get the logic right
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
-----------------------------------------------------
20040 2017-11-01 21:00:00 23:00:00
20040 2017-11-01 23:00:00 00:30:00
20040 2017-11-01 00:30:00 06:00:00
20124 2017-11-01 09:00:00 16:30:00
20124 2017-11-01 16:30:00 22:00:00
20124 2017-11-01 22:00:00 22:30:00
I need it like below:
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
----------------------------------------------------
20040 2017-11-01 21:00:00 06:00:00
20124 2017-11-01 09:00:00 22:30:00
In a commercial environment we solved this by attaching a FLAG to each shift. The Flag would indicate the 'Reporting Date' of the Shift...The Flag would have have a value of 1 if the 'Reporting / Administrative date' was the 'next' day. 0 for the same day. -1 for the previous day (which we never used...depends on your scenario)
I modified your table to show a possible SHIFTS table, which should also have a NAME column I guess (like Morning, Afternoon, Day, Night shift etc)
ReportFlag ShiftStartTime ShiftEndTime
1 21:00:00 23:00:00
1 23:00:00 00:30:00
0 00:30:00 06:00:00
0 09:00:00 16:30:00
0 16:30:00 22:00:00
1 22:00:00 22:30:00
Notice how I added 1 - to say that 'this shift' is actually considered to be on the 'next' day.
Then you can use your flag value 0,1 to add to DATE functions in your queries too

Sql Select Time Stamp and Represent it as 9 Hours Back

Good day. I have a table that is collecting data in Universal Time Coordinated timestamps. However, the location is 9 hours back from this time. I am writing a query that gets the time-stamp and the value but 'casts' the timestamp 9 hours back since thats when it got recorded with respect to that location.
My issue is that I keep subtracting days not hours even though I specified hours in my 'datediff' and 'dateadd'. How do I select a timestamp and the value but represent that timestamp as 9 hours back? Thanks for any help.
select DATEADD(hour, DATEDIFF(hour,9,TimeUTC),0) as DateActual, Value
From TableData
Data
2015-12-15 00:00:00 45
2015-12-15 00:00:00 54
Current results
2015-12-06 00:00:00 45
2015-12-06 00:00:00 54
Desired results
2015-12-14 15:00:00 45
2015-12-14 15:00:00 54