SQL cohort retention - sql

The below question is actually copied from the other post and asking for Tableau answer but I would like to use SQL to prevent from performance problem.
I'm trying to calculate user retention rates across dates and for the last 14 days. For example, if 44 users arrive for the first time on September 16th, and then 19 of them show up again on September 17th, our day 1 retention for those September 16th users is 19/44. And if 41 users showed up for the first time on September 17th and 24 of those came back again on September 18th, then the September 17th 1 day retention would be 24/41. And if 18 users returned on September 18th who arrived for the first time on September 16th, then their 2 day retention would be 18/44.
The final outcome I would like to have is shown as below. I'm trying to figure out how to calculate the retention for Cohort Day by date. In addition, the table login contains the following columns, TimeStamp, userid, gamelabel, and play_time.
Login Table
TimeStamp | Userid | GameLabel | playtime |
-----------------------+-----------+------------+-----------
2016-09-16 21:00:24+8 | af07 | LL | 60010 |
2016-09-16 21:00:25+8 | 9dbe | YY | 60016 |
2016-09-16 21:01:24+8 | af07 | SS | 60009 |
The Final Outcome I would like to have
Retention| Today | Today- 1 Day|Today- 2 Day...|Today-12 Day |Today-13 Day
---------+--------+-------------+---------------+------------------+--------
|09/29/16| 09/28/16 | 09/27/16...| 09/17/16 | 09/16/16
0 | | | | 41/41| 44/44
1 | | | | 24/41| 19/44
2 | | | | | 18/44
3 | | | | |
7 | | | | |
Table Login
The Final Outcome

Related

How to query to get dates from the previous fiscal week of this year and the corresponding fiscal week from last year in a dynamic table in GCP?

So say I have a table that refreshes weekly. There's a column for fiscal week (FW). Say we are currently on fiscal week 32, and the way this query should go is that we always need the week prior so FW31 in this example. However, I not only need FW31 of this year, but FW31 of last year too. What's a way to create a dynamic query that would do that, if possible?
Example table below:
YEAR | FW | Dates | Info_1
... | ... | ... | ...
2019 | 30 | 09-02-2020 | blah
2019 | 30 | 09-03-2020 | blah
2019 | 30 | 09-04-2020 | blah
... | ... | ... | ...
2019 | 31 | 09-10-2020 |
... | ... | ... | ...
2020 | 30 | 09-06-2020 |
... | ... | ... | ...
2020 | 31 | 09-14-2020 | blah
2020 | 31 | 09-15-2020 | blah
2020 | 31 | 09-16-2020 | blah
... | ... | ... | ...
So to my understanding, it wouldn't be possible to do it by date since the fiscal week of this year might not correspond with the exact same dates for the same fiscal week of last year. So I'm banking on utilizing the 'FW' column in order to pull it. However, again, this is something that I would like for the query to be able to change each week, as far as going from 31 to 32 and so on. This is within Google Cloud Platform, so I'd love to save it as view.
Given that I don't know what you actually mean with "fiscal week", a possible solution could be the following:
SELECT *
FROM `example_db`.`example_table`
WHERE
`FW` = weekofyear(date(now())-7) AND
`YEAR` IN (year(now()), year(now())-1);

Finding how many days left per user per year

I have a table that tracks leave days for each user:
ID | Start | End | IDUser
1 | 02-02-2020 | 03-02-2020 | 2
2 | 01-02-2020 | 21-02-2020 | 2
IDUser connects to the Users Table, that has IDUser and Username columns
I have a view / exhibition / query that shows previous mentioned columns data PLUS a column named UsedDays that counts how many leave days were used:
DATEDIFF(DAY, dbo.leavedays.start, dbo.leavedays.[end]) + 1
This is what I have now:
Start | End | IDUser | UsedDays
02-02-2020 | 03-02-2020 | 2 | 1
01-02-2020 | 21-02-2020 | 1 | 20
Each user has a total available number of days per year so I would like to have a column that subtracts from those total possible days of each user, and show how many are left.
Example:
John (IDUser = 2) has 30 days available this year and he already used 1, so there are 29 left
Start | End | IDUser | TotalDaysYear | UsedDays | LeftDays
02-02-2020 | 03-02-2020 | 2 | 30 | 1 | 29
01-02-2020 | 21-02-2020 | 1 | 20 | 20 | 0
I believe I have to create a table for TotalDaysYear, probably with:
ID | Year | TotalDaysYear | IDUser
1 | 2020 | 30 | 2
2 | 2020 | 20 | 1
IDUser connects to the Users Table, that has IDUser and Username columns
But I'm having trouble finding the logic for the relationship and how to find the result that I want, since it depends also on the year (available days may change per yer, per user).
Assuming you are using SQL Server, this should work:
SELECT
ld.start,
ld.[end],
ld.IDUser,
ldy.TotalDaysYear,
SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as UsedDays,
ldy.TotalDaysYear - SUM(DATEDIFF(DAY, ld.start, ld.[end])+1) OVER (PARTITION BY ld.IDUser, YEAR(ld.start) ORDER BY ld.start) as LeftDays
FROM leavedays ld
LEFT JOIN leavedaysperyear ldy
ON YEAR(ld.start) = ldy.Year AND ld.IDUser = ldy.IDUser
Basic idea is to have a running total of Used Days per user, per year and then subtract it to total available days for that user, during that same year.
Here's a SQLFiddle
NB. The example provided doesn't handle leave periods across years

How can I get a record to be counted in multiple columns of a Crosstab Query?

Background information:
My company requires employees to maintain at least one certification (cert) on a position. There are a total of 17 different certifications that an employee can get.
An employee can hold multiple certs. But on any one day they can only "sit" one of the positions that they are certified in. Most employees primarily sit the highest level position that they hold a cert in, but can sit a lower level position if there are manning shortages in that position and if they hold that particular cert (some employees come to us holding the higher level certs but none of the lower ones because they let them expire).
Multiple employees can hold the same cert.
Around 90% of employees are on contract, meaning they have a set termination date. Contracts can be extended but for the sake of this Access database, and the report to be generated, we're presuming that the termination date is set in stone.
My boss (and boss' boss) are wanting to put together a manning projection report so that they don't get caught off guard should we start running low on employees certified in any one position.
Example of what they want:
Lets say you have three employees:
Employee1 has certs in position1, position2, and position3 but he primarily sits as position3 and his contract expires June 2020.
Employee2 has certs in position1 and position2 but primarily sits as position2 and her contract expires in February 2022.
Employee3 is new and arrived August 2019 and is in training to get position1, maximum allowed training time for initial cert is 3 months, so presumably he should have his position1 cert by December 2019 and his contract expires August 2025.
Lets say my boss wants to project out 12 months with the starting month being November 2019 (he'll only be able to select a starting month-year that is equal to or later than the current month-year). The charts below, which are generated in subreports, should be what gets generated off of the above employee information.
All Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 |
| Position2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Primary Certifications Chart
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Cert | Nov 19 | Dec 19 | Jan 20 | Feb 20 | Mar 20 | Apr 20 | May 20 | Jun 20 | Jul 20 | Aug 20 | Sep 20 | Oct 20 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
| Position1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Position3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
+-----------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Now I already have a solution in place but it's extremely inefficient and involves a query for each cell (2 Charts X 12 Months X 17 positions = 408 Queries when a report is generated). I'm hoping to do something more efficient with a crosstab query.
The tables are set up as such (only listing relevant fields):
Emp_table
ID (autoNum)
contractStarted (Date)
contractEnd (Date)
Cert_individual
ID (autoNum)
certID (num, many->one relationship to cert_table.ID)
EmpID (num, many->one relationship to Emp_table.ID)
date_cert_received (date)
primary (yes/no)
cert_table
ID (autoNum)
cert_name (short text)
Obviously I'd need to do a couple of INNER JOINS in order to get everything together and I tried using the format from this website for my crosstab query but it would only add an individual cert to a count on the month-year that the employee received it and not to every month that the employee will hold the cert.
So my question is:
Is there a way in SQL or VBA to get a cert counted across multiple columns (month-years) based off of when the employee received the cert and when their contract is scheduled to terminate?
As far as I know, the main problem in getting the crosstab query is that it can only generate columns with data that you already have.
A solution for you to get the monthly columns would be to have a side table with the 12 dates and then use the Cartesian product to generate the monthly data for each of your records in your certification table. This "date" table can be updated and maintained to match the months that you require in your report with a query.
For example, if you have a table named TempDates :
And a table with Employees with the following data :
You can generate the cartesian product with a query that I named QryCertsDates :
SELECT Employees.*, TempDates.* FROM TempDates, Employees;
Which lets you attach all the wanted dates with your original date from the table Employees in order to obtain data similar to below :
Now you can generate your crosstab query pivoting on the month and year and filtering the dates with the WHERE criteria such as :
TRANSFORM Count(QryCertsDates.Cert) AS CountOfCert SELECT QryCertsDates.Cert FROM QryCertsDates WHERE (((CDate([Yr] & "-" & [Mo])) Between CDate([Start]) And CDate([Expire]))) GROUP BY QryCertsDates.Cert PIVOT CDate([Yr] & "-" & Format([Mo],"00"));
You will end up ultimately with something like this :
You can do the same thing to get your second table/report as well. I don't know your database structure, so you will most likely need to do some adaptation. The other possible way that you can achieve a similar result would be to fill in a table using VBA.
However, this might be the easier solution to implement. Good luck!

ORACLE SQL query to get rows of intervals of 30 minutes based on two hours

I need to do a query that give me rows of 30 minutes of intervals based on two hours, start_hour and end_hour.
I have a table, in this table i have this columns "start_hour and end_hour".
Assuming that i have this
| start_hour | end_hour |
| 09:00AM | 08:00PM |
I need a query that gave a result like this.
| intervals |
| 09:00AM |
| 09:30AM |
| 10:00AM |
| 10:30AM |
| 11:00AM |
| 11:30AM |
| 12:00AM |
| 12:30AM |
...
...
...
| 07:30PM |
| 08:00PM |
And the rows need to finish in te end_hour value i have in the table, as shown in the example.
Someone can help me how to do it, i tried rounding the start_hour, but i don't have any result.
This is a bit clunky and will take a bit of editing based on your specific needs, but it's a very slightly modified bit of code I used a few years back that should work as a solid starting point for you:
select to_char(time_slot,'HH:MIPM')
from (select trunc(to_date('05/23/2019','MM/DD/YYYY'))+(rownum-1)*(30/24/60) time_slot
from dual
connect by level <= (24*2))
where to_char(time_slot,'HH24:MI') between
--start_hour
'09:00'
and
--end hour
'20:00';
OUTPUT
09:00AM
09:30AM
10:00AM
10:30AM
11:00AM
11:30AM
12:00PM
12:30PM
01:00PM
01:30PM
02:00PM
02:30PM
03:00PM
03:30PM
04:00PM
04:30PM
05:00PM
05:30PM
06:00PM
06:30PM
07:00PM
07:30PM
08:00PM

Group and count activity by week in either Ruby, Activerecord, or Postgresql

I have an activity log that stretches across a few years. I have been asked to calculate weekly engagement for each user for the application. I define engagement as a user doing one or more logged activities in any given week.
How do I group those activities and count them by week for each user? I have read a lot of different posts, and there seems to be debate about whether ruby methods, sql or arel syntax are best. I don't have more than 500 users so performance is not a concern as much as something that is succinct.
I have successfully tried this:
user = User.first.activity_logs.group_by { |m| m.created_at.beginning_of_week }
# => {Mon, 11 Mar 2013 00:00:00 EDT -04:00=>
[#<ActivityLog id: 12345, user_id: 429, ... ]}
Then the only next step I can get to return anything without error:
user.map { |week| week.count } => [2, 2, 2, 2, 2, 2, 2, 2]
So it seems like I am making this too complicated. How do I succinctly count the number of activities by week and do that for each user?
I just want something that I can ultimately paste into a spreadsheet (for example, below) to make a heat map or some other chart for managers.
| User | Week | Activity|
| ------------- | :-------------: | -------:|
| jho | 2013-1 | 20 |
| bmo | 2013-1 | 5 |
| jlo | 2013-1 | 11 |
| gdo | 2013-2 | 2 |
| gdo | 2013-5 | 3 |
| jho | 2013-6 | 5 |
EDIT
As reference for others:
Rails 3.1
Using PostgreSQL 9.1.4
Here is the schema file from ruby on rails
create_table "activity_logs", :force => true do |t|
t.integer "user_id"
t.string "activity_type"
t.datetime "created_at"
t.datetime "updated_at"
end
| ------+| --------+| ----------------+| ----------------+ | ----------------+ |
| id | user_id | activity_type | created_at | updated_at |
| ------+| --------+| ----------------+| ----------------+ | ----------------+ |
| 28257 | 8 | User Signin | 2013-02-14 1... | 2013-02-14 1... |
| 25878 | 7 | Password Res... | 2013-02-03 1... | 2013-02-03 1... |
| 25879 | 7 | User Signin | 2013-02-03 1... | 2013-02-03 1... |
| 25877 | 8 | Password Res... | 2013-02-03 1... | 2013-02-03 1... |
| 19325 | 8 | Created report | 2012-12-16 0... | 2012-12-16 0... |
| 19324 | 9 | Added product | 2012-12-16 0... | 2012-12-16 0... |
| 18702 | 8 | Added event | 2012-12-15 1... | 2012-12-15 1... |
| 18701 | 1 | Birthday Email | 2012-12-15 0... | 2012-12-15 0... |
| ------+| --------+| ----------------+| ----------------+ | ----------------+ |
SOLUTION
Modifying #Erwin Brandstetter's command, I got the desired result like so on the command line:
ActivityLogs.find_by_sql("
SELECT user_id, to_char(created_at, 'YYYY-WW') AS week, count(*) AS activity
FROM activity_logs
GROUP BY 1, 2
ORDER BY 1, 2;")
I borrowed the test table from #ideamotor and simplified it. Type of activity is irrelevant, counting each activity as 1:
CREATE TEMP TABLE log(usr text, day date);
INSERT INTO log VALUES
('bob' , '2012-01-01')
,('bob' , '2012-01-02')
,('bob' , '2012-01-14')
,('susi', '2012-01-01')
,('susi', '2012-01-14');
Query (won't get much more succinct than this):
SELECT usr, to_char(day, 'YYYY-WW') AS week, count(*) AS activity
FROM log
GROUP BY 1, 2
ORDER BY 1, 2;
Result:
usr | week | activity
-----+----------+---------
bob | 2012-01 | 2
bob | 2012-02 | 1
susi | 2012-01 | 1
susi | 2012-02 | 1
to_char() makes this very simple. I quote the manual here:
WW week number of year (1-53) (The first week starts on the first day of the year.)
As alternatice consider:
IW ISO week number of year (01 - 53; the first Thursday of the new year is in week 1.)
Here it is in Postgresql. The trick here is you need to generate your year-weekofyear value. Here I am pulling out information from the date and concatenating it.
Here I am ensuring that '2012-01-01' does not get counted as the 52nd week. I'm overriding the standard. You may need to alter this function depending on how you define your weeks.
create temp table daily_log(person character varying, activity numeric,
dayof date);
insert into daily_log values
('bob' ,1,'2012-01-01')
,('bob' ,1,'2012-01-02')
,('bob' ,0,'2012-01-14')
,('charlie',1,'2012-01-01')
,('charlie',1,'2012-01-14')
select person
,extract('year' from dayof) || '-' ||
case when extract('week' FROM dayof) >= 52
and extract('month' FROM dayof) = 1
then 1
else extract('week' FROM dayof) end as weekof
,sum(activity) as activity_cnt
from daily_log
group by weekof, person
order by person, weekof;
That will get you:
| person | weekof | activity_cnt|
| -------------:| :--------------:| -----------:|
| bob | 2012-1 | 2 |
| bob | 2012-2 | 0 |
| charlie | 2012-1 | 1 |
| charlie | 2012-2 | 1 |
Why I used 2012, I don't know.
Here is what the postgresl manual says about extracting the week (http://www.postgresql.org/docs/9.2/static/functions-datetime.html):
"The number of the week of the year that the day is in. By definition (ISO 8601), the first week of a year contains January 4 of that year. (The ISO-8601 week starts on Monday.) In other words, the first Thursday of a year is in week 1 of that year. (for timestamp values only)
Because of this, it is possible for early January dates to be part of the 52nd or 53rd week of the previous year. For example, 2005-01-01 is part of the 53rd week of year 2004, and 2006-01-01 is part of the 52nd week of year 2005."