Dynamically compare week to week by cohorts - sql

Objective:
Get Id logins in week 1. Then how many of those Ids logged in in Week 2.
Restart the same logic for Week 2 to Week 3.
Then week 3 and week 4 and so on... This exercise needs to be done every week.
The Ids need to be segmented by cohorts which are the month and year they subscribed.
Story:
First table (member) has the email and its creation date. The 2nd table (login table) is the login activity. First, I need to group emails by creation date(month-year) to create the cohorts.
Then, the login activity comparing week to week for each cohort. Is it possible
for this query to be dynamic each week?
Output:
The result should look like this:
+--------+--------+--------+--------+---------+
| Cohort | 2019-1 | 2019-2 | 2019-3 | 2019-4 |...
+--------+--------+--------+--------+---------+
| 2018-3 | 7000 | 6800 | 7400| 7100 |...
| 2018-4 | 6800 | 6500 | 8400| 8000 |...
| 2018-5 | 9500 | 8000 | 6400| 6200 |...
| 2018-6 | 9100 | 8500 | 8000| 7800 |...
| 2018-7 | 10000 | 8000 | 7000| 6800 |...
+--------+--------+--------+--------+---------+
What I tried:
SELECT CONCAT(DATEPART(YEAR,m.date_created),'-',DATEPART(MONTH,m.date_created)) AS Cohort
,CONCAT(subquery.[YYYY],'-',subquery.[ISO]) AS YYYY_ISO
,m.email
FROM member as m
INNER JOIN (SELECT DATEPART(YEAR,log.login_time) AS [YYYY]
,DATEPART(ISO_WEEK,log.login_time) AS [ISO]
,log.email
,ROW_NUMBER()
OVER(PARTITION BY
DATEPART(YEAR,log.login_time),
DATEPART(ISO_WEEK,log.login_time),
log.email
ORDER BY log.login_time ASC) AS Log_Rank
FROM login AS log
WHERE CAST(log.login_time AS DATE) >= '2019-01-01'
) AS subquery ON m.email=subquery.email AND Log_Rank = 1
ORDER BY cohort
Sample Data:
CREATE TABLE member
([email] varchar(50), [date_created] Datetime)
CREATE TABLE login
([email] varchar(50), [login_time] Datetime)
INSERT INTO member
VALUES
('player123#google.com', '2018-03-01 05:00:00'),
('player999#google.com', '2018-04-12 12:00:00'),
('player555#google.com', '2018-04-25 20:15:00')
INSERT INTO login
VALUES
('player123#google.com', '2019-01-07 05:30:00'),
('player123#google.com', '2019-01-08 08:30:00'),
('player123#google.com', '2019-01-15 06:30:00'),
('player999#google.com', '2019-01-08 11:30:00'),
('player999#google.com', '2019-01-10 07:30:00'),
('player555#google.com', '2019-01-08 04:30:00')

Related

How to list uniques entries in a column and in the corresponding times of their repition in next column?

I have a hospital database which looks something like this:
id | patient_name | admitDate | DischargeDate
1 | john | 3/01/2011 08:50 | 5/01/2011 12:50
2 | lisa | 3/01/2011 09:50 | 4/01/2011 13:50
3 | ron | 5/01/2012 10:40 | 10/01/2012 03:50
4 | howard | 6/02/2013 08:05 | 10/02/2013 08:50
5 | john | 6/02/2013 12:04 | 7/02/2013 01:50
The admitDate is same for many entries (time may be different). I want to find out how many patients were admitted on any particular day so if I do this:
select distinct left(admitDate,10),
(select count(distinct left(admitDate,10) ) from hospital)
from hospital
I get output as all distinct admit dates in 1st column and same value 5 in all rows of second column. How do I make it so that only corresponding repetition count is found in 2nd column and not the count of entire admitDate set.
datatype of admitdate is varchar(50)
I am using left function because I only have to find out uniqueness in dates not in time.
Expected result:
admitDate | Count
3/01/2011 | 2
5/01/2012 | 1
6/02/2013 | 2
Current result:
admitDate | Count
3/01/2011 | 5
5/01/2012 | 5
6/02/2013 | 5
If your admitDate Column has Time too, you need use Convert() function to eliminate the time to group by your data per each day:
Select CONVERT(date, admitDate), count(*)
from hospital
group by CONVERT(date, admitDate);
If you use Varchar instead of Date data type for your admitDate column you can try this:
SELECT LEFT(admitDate, charindex(' ', admitDate) - 1) as ADMITDATE , count(*) as COUNTER
from hospital
group by LEFT(admitDate, charindex(' ', admitDate) - 1) ;
or:
SELECT convert(date, (convert(datetime2, admitDate,103)) ), count(*)
from hospital
group by convert(date, (convert(datetime2, admitDate,103)) )

Querying the retention rate on multiple days with SQL

Given a simple data model that consists of a user table and a check_in table with a date field, I want to calculate the retention date of my users. So for example, for all users with one or more check ins, I want the percentage of users who did a check in on their 2nd day, on their 3rd day and so on.
My SQL skills are pretty basic as it's not a tool that I use that often in my day-to-day work, and I know that this is beyond the types of queries I am used to. I've been looking into pivot tables to achieve this but I am unsure if this is the correct path.
Edit:
The user table does not have a registration date. One can assume it only contains the ID for this example.
Here is some sample data for the check_in table:
| user_id | date |
=====================================
| 1 | 2020-09-02 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 12:00:00 |
-------------------------------------
| 1 | 2020-09-04 13:00:00 |
-------------------------------------
| 4 | 2020-09-04 11:00:00 |
-------------------------------------
| ... |
-------------------------------------
And the expected output of the query would be something like this:
| day_0 | day_1 | day_2 | day_3 |
=================================
| 70% | 67 % | 44% | 32% |
---------------------------------
Please note that I've used random numbers for this output just to illustrate the format.
Oh, I see. Assuming you mean days between checkins for users -- and users might have none -- then just use aggregation and window functions:
select sum( (ci.date = ci.min_date)::numeric ) / u.num_users as day_0,
sum( (ci.date = ci.min_date + interval '1 day')::numeric ) / u.num_users as day_1,
sum( (ci.date = ci.min_date + interval '2 day')::numeric ) / u.num_users as day_2
from (select u.*, count(*) over () as num_users
from users u
) u left join
(select ci.user_id, ci.date::date as date,
min(min(date::date)) over (partition by user_id order by date) as min_date
from checkins ci
group by user_id, ci.date::date
) ci;
Note that this aggregates the checkins table by user id and date. This ensures that there is only one row per date.

How to create a chart to get number of account for a customer by period in sql

I have an issue we want to create a request to add customer number of account by period.
For each account I have : accountid, customerid, createddate and deleteddate
select accountid,customerid, createddate , deleteddate from account
where customerid = 1
This customer have 4 accounts :
accountid | customerid | createddate | deleteddate
2145 | 6641 | 2018-12-12 10:39:16.457 | 2020-03-26 00:00:12.540
2718 | 6641 | 2020-02-11 15:04:51.643 | 2020-03-26 00:00:04.947
2825 | 46818 | 2020-04-14 15:28:30.400 | 2020-04-29 15:58:30.651
2851 | 46818 | 2020-06-05 12:41:45.790 | NULL
so i want a chart for current year to get the nb of account of the customer not for each month but for each modification
For exemple 02/01/2020 I will have 1 account
03/01/2020 I will have 0 account
It is possible to do that or something like that in SQL ? And how can I do it if it's possible.
get the nb of account of the customer not for each month but for each modification
Is this what you want?
select
x.customer_id,
x.modifdate,
sum(x.cnt) over(partition by x.customer_id order by x.modifdate) no_active_accounts
from mytable t
cross apply (
values (customer_id, createddate, 1), (customer_id, deleteddate, -1)
) as x(customer_id, modifdate, cnt)
where modifdate is not null
For each customer, this generates one record everytime an account is created or deleted, with the modification date and the running count of active accounts.

How to create a table that loops over data in Postgres

I want to create a table that returns the top 10 aggregate cons_name over a given week, that repeats every day.
So for 5/29/2019 it will pull the top 10 cons_name by their sum dating back to 5/22/2019.
Then, for 5/28/2019, the top 10 cons_name by their sum back to 5/21/2019.
A table of top 10 dating back 7 days all the way to 2018-12-01.
I can make the simple code dating back 7 days but, I have tried Windows to no avail.
SELECT cons_name,
pricedate,
sum(shadow)
FROM spp.rtbinds
WHERE pricedate >= current_date - 7
GROUP BY cons_name, shadow, pricedate
ORDER BY shadow asc
LIMIT 10
This query generates the output below
cons_name pricedate sum
"TEMP17_24078" "2019-05-28 00:00:00" "-1473.29723333333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1383.56638333333"
"TMP175_24736" "2019-05-23 00:00:00" "-1378.40504166667"
"TMP159_24149" "2019-05-23 00:00:00" "-1328.847675"
"TMP397_24836" "2019-05-23 00:00:00" "-1221.19560833333"
"TEMP17_24078" "2019-05-28 00:00:00" "-1214.9914"
"TMP175_24736" "2019-05-23 00:00:00" "-1123.83254166667"
"TEMP72_22893" "2019-05-29 00:00:00" "-1105.93840833333"
"TMP164_23704" "2019-05-24 00:00:00" "-1053.051375"
"TMP175_24736" "2019-05-27 00:00:00" "-1043.52104166667"
I would like a table and function that returns a table of each day's top 10 dating back a week.
Using window functions get's you on the right track but you should be reading further in the documentation about the possibilities.
We have multiple issues here that we need to solve:
gaps in the data (missing pricedate) not get us the correct number of rows (7) to calculate the overall sum
for the calculation itself we need all data rows so the WHERE clause cannot be used to limit only to the visible days
in order to select the top-10 for each day, we have to generate a row number per partition because the LIMIT clause cannot be applied per group
This is why I came up with the following CTE's:
CTE days: generate the gap-less date series and mark visible days
CTE daily: LEFT JOIN the data to the generated days and produce daily sums (and handle NULL entries)
CTE calc: produce the cumulative sums
CTE numbered: produce row numbers reset each day
select the actual visible rows and limit them to max. 10 per day
So for a specific week (2019-05-26 - 2019-06-01), the query will look like the following:
WITH
days (c_day, c_visible, c_lookback) as (
SELECT gen::date, (CASE WHEN gen::date < '2019-05-26' THEN false ELSE true END), gen::date - 6
FROM generate_series('2019-05-26'::date - 6, '2019-06-01'::date, '1 day'::interval) AS gen
),
daily (cons_name, pricedate, shadow_sum) AS (
SELECT
r.cons_name,
r.pricedate::date,
coalesce(sum(r.shadow), 0)
FROM days
LEFT JOIN spp.rtbinds AS r ON (r.pricedate::date = days.c_day)
GROUP BY 1, 2
),
calc (cons_name, pricedate, shadow_sum) AS (
SELECT
cons_name,
pricedate,
sum(shadow_sum) OVER (PARTITION BY cons_name ORDER BY pricedate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW)
FROM daily
),
numbered (cons_name, pricedate, shadow_sum, position) AS (
SELECT
calc.cons_name,
calc.pricedate,
calc.shadow_sum,
ROW_NUMBER() OVER (PARTITION BY calc.pricedate ORDER BY calc.shadow_sum DESC)
FROM calc
)
SELECT
days.c_lookback,
numbered.cons_name,
numbered.shadow_sum
FROM numbered
INNER JOIN days ON (days.c_day = numbered.pricedate AND days.c_visible)
WHERE numbered.position < 11
ORDER BY numbered.pricedate DESC, numbered.shadow_sum DESC;
Online example with generated test data: https://dbfiddle.uk/?rdbms=postgres_11&fiddle=a83a52e33ffea3783207e6b403bc226a
Example output:
c_lookback | cons_name | shadow_sum
------------+--------------+------------------
2019-05-26 | TMP400_27000 | 4578.04474575352
2019-05-26 | TMP700_25000 | 4366.56857151864
2019-05-26 | TMP200_24000 | 3901.50325547671
2019-05-26 | TMP400_24000 | 3849.39595793188
2019-05-26 | TMP700_28000 | 3763.51693260809
2019-05-26 | TMP600_26000 | 3751.72016620729
2019-05-26 | TMP500_28000 | 3610.75970225036
2019-05-26 | TMP300_26000 | 3598.36888491176
2019-05-26 | TMP600_27000 | 3583.89777677553
2019-05-26 | TMP300_21000 | 3556.60386707587
2019-05-25 | TMP400_27000 | 4687.20302128047
2019-05-25 | TMP200_24000 | 4453.61603102228
2019-05-25 | TMP700_25000 | 4319.10566615313
2019-05-25 | TMP400_24000 | 4039.01832416654
2019-05-25 | TMP600_27000 | 3986.68667223025
2019-05-25 | TMP600_26000 | 3879.92447655788
2019-05-25 | TMP700_28000 | 3632.56970774056
2019-05-25 | TMP800_25000 | 3604.1630071504
2019-05-25 | TMP600_28000 | 3572.50801157858
2019-05-25 | TMP500_27000 | 3536.57885829499
2019-05-24 | TMP400_27000 | 5034.53660146287
2019-05-24 | TMP200_24000 | 4646.08844632655
2019-05-24 | TMP600_26000 | 4377.5741555281
2019-05-24 | TMP700_25000 | 4321.11906399066
2019-05-24 | TMP400_24000 | 4071.37184911687
2019-05-24 | TMP600_25000 | 3795.00857752701
2019-05-24 | TMP700_26000 | 3518.6449117614
2019-05-24 | TMP600_24000 | 3368.15348120732
2019-05-24 | TMP200_25000 | 3305.84444172308
2019-05-24 | TMP500_28000 | 3162.57388606668
2019-05-23 | TMP400_27000 | 4057.08620966971
2019-05-23 | TMP700_26000 | 4024.11812392669
...

Running total of values from a table until it matches value from another table

I have 2 tables.
Table 1 is a temp variable table:
declare #Temp as table ( proj_num varchar(10), sum_dom decimal(23,8))
My temp table is populated with a list of project numbers, and a month end accounting dollar amount.
For example:
proj_num | sum_dom
11522 | 2477.15
11524 | 26474.20
41865 | 9012.10
Table 2 is a Project Transactions table.
We're concerned with just the following columns:
proj_num
amount
cost_code
tran_date
Individual values will somemething like this:
proj_num | cost_code | amount | tran_date
11522 | LBR | 112.10 | 10/1/2018
11522 | LBR | 1765.90 | 10/2/2018
11522 | MAT | 599.15 | 10/3/2018
11522 | FRT | 57.50 | 10/4/2018
So for this project, since the grand total of $2477.15 is met on 10/3, example output would be:
proj_num | cost_code | amount
11522 | LBR | 1878.00
11522 | MAT | 599.15
I want to sum the amounts (grouped by cost_code, and ordered by tran_date) under the project transaction table until the total sum of values for that project value matches the value in the sum_dom column of the temp table, at which point I will output that data.
Can you help me figure out how to write the query to do that?
I know I should avoid cursors, but I havent had much luck with my attempts so far. I cant seem to get it to keep a running total.
Running sum is done using SUM(...) OVER (ORDER BY ...). You just need to tell where to stop:
SELECT sq.*
FROM projects
INNER JOIN (
SELECT
proj_num,
cost_code,
amount,
SUM(amount) OVER (PARTITION BY proj_num ORDER BY tran_date) AS running_sum
FROM project_transactions
) AS sq ON projects.proj_num = sq.proj_num
WHERE running_sum <= projects.sum_dom
DB Fiddle