Previous Month MTD - sql

Is there anyways I can find Previous month's MTD?
So my data is at day level and I need to find MTD and previous month MTD
D_date product TOTAL_UNIT
01/AUG/2020 A 10
01/AUG/2020 B 20
02/AUG/2020 A 15
02/AUG/2020 B 25
29/JUL/2020 A 5
29/JUL/2020 B 0
30/JUL/2020 A 2
31/JUL/2020 B 30
I can get current month MTD using below SQL (Oracle)
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
However, when I do add_months -1 to get PMTD, it still shows the current MTD
I tried doing
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(ADD_MONTHS(D_DATE,-1),'MM') ORDER BY D_DATE )MTD
Another way I can find is by doing a self-join but I would like to avoid that for performance issues.

Changing your partition-by clause to TRUNC(ADD_MONTHS(D_DATE,-1),'MM') - or ADD_MONTHS(TRUNC(D_DATE,'MM'),-1) - gives you a different value for that partitioning bu exactly the same groups as plain TRUNC(D_DATE,'MM').
If you want to get the last MTD before the current month you can put your existing query as a subquery and use lag():
select d_date, product, total_unit, m_date, mtd,
last_value(mtd) over (partition by product order by m_date range between unbounded preceding and 1 preceding) as prev_mtd
from (
select d_date, product, total_unit,
TRUNC(D_DATE,'MM') m_date,
SUM(TOTAL_UNIT)OVER( PARTITION BY PRODUCT, TRUNC(D_DATE,'MM') ORDER BY D_DATE )MTD
from your_table
)
order by product, d_date;
D_DATE | PRODUCT | TOTAL_UNIT | M_DATE | MTD | PREV_MTD
:-------- | :------ | ---------: | :-------- | --: | -------:
29-JUL-20 | A | 5 | 01-JUL-20 | 5 | null
30-JUL-20 | A | 2 | 01-JUL-20 | 7 | null
01-AUG-20 | A | 10 | 01-AUG-20 | 10 | 7
02-AUG-20 | A | 15 | 01-AUG-20 | 25 | 7
29-JUL-20 | B | 0 | 01-JUL-20 | 0 | null
31-JUL-20 | B | 30 | 01-JUL-20 | 30 | null
01-AUG-20 | B | 20 | 01-AUG-20 | 20 | 30
02-AUG-20 | B | 25 | 01-AUG-20 | 45 | 30
db<>fiddle

That is because you're using add_months inside an analytic function. So what you are doing is calculate the sum, partitioned by the month, as if it were the previous month.. While the last part doesn't make much sense to me, it does give you the correct SUM, but you didn't tell SQL to show the the previous month as well.
If you want the D_DATE to be showed as previous month, add another column trunc( ADD_MONTHS( d_date, -1), 'MM') as pmtd
select trunc( ADD_MONTHS( sysdate, -1), 'MM') from dual;
This will always get you the first date of the previous month.

Related

How to generate a date array and forward fill missing data using BigQuery?

I have a table with weeks of missing data (shown below):
week | customer_id | score
-----------|--------------|---------
2019-10-27 | 1 | 3
2019-11-10 | 1 | 4
2019-10-20 | 2 | 5
2019-10-27 | 2 | 8
Therefore I've used BigQuery's GENERATE_DATE_ARRAY function to fill in the missing weeks for each customer (in the range 2019-10-20 to 2019-11-10), which results in a NULL customer_id and score value for those weeks that were missing (shown below).
week | customer_id | score
-----------|--------------|---------
2019-10-20 | NULL | NULL
2019-10-27 | 1 | 3
2019-11-03 | NULL | NULL
2019-11-10 | 1 | 4
2019-10-20 | 2 | 5
2019-10-27 | 2 | 8
2019-11-03 | NULL | NULL
2019-11-10 | NULL | NULL
I want to forward fill the customer_id and score for each customer using the last non-null value so that the table looks like this:
week | customer_id | score
-----------|--------------|---------
2019-10-20 | NULL | NULL
2019-10-27 | 1 | 3
2019-11-03 | 1 | 3
2019-11-10 | 1 | 4
2019-10-20 | 2 | 5
2019-10-27 | 2 | 8
2019-11-03 | 2 | 8
2019-11-10 | 2 | 8
I wrote this query, however, since the customer_id value is NULL in some rows, I am unable to partition by this field and it is instead returning NULL values. If I filter for WHERE customer_id = 1 and remove the PARTITION BY clause, I get the desired result, but I cannot get it to work for multiple customers.
WITH weeks AS
(SELECT created_week
FROM UNNEST(GENERATE_DATE_ARRAY('2019-10-20', '2019-11-10', INTERVAL 1 WEEK)) week
),
table AS
(SELECT *, DATE_TRUNC(EXTRACT(DATE FROM created_at), WEEK) AS week,
FROM score
)
SELECT weeks.week,
COALESCE(table.customer_id, LAST_VALUE(table.customer_id IGNORE NULLS) OVER (PARTITION BY table.customer_id ORDER BY weeks.week)) AS customer_id,
COALESCE(table.score, LAST_VALUE(table.score IGNORE NULLS) OVER (PARTITION BY table.customer_id ORDER BY weeks.week)) AS score,
FROM weeks
LEFT JOIN table
ON weeks.week = table.week
I am wondering how I can generate this date array for each customer and then somehow forward fill any missing data using the last customer_id and score value for that customer. Any help would be greatly appreciated!
The most efficient way is just to generate the data as you need it:
select the_week, t.customerid, t.score
from (select DATE_TRUNC(EXTRACT(DATE FROM created_at), WEEK) AS week,
customerid, score,
lead(DATE_TRUNC(EXTRACT(DATE FROM created_at), WEEK)) over (partition by customerid order by created_at) as next_week
from t
) t cross join
unnest(generate_date_array(t.week,
date_add(t.next_week, interval -1 week),
interval 1 week
)) the_week;
By generating only the dates you need for each week, you don't need to "fill" anything in. The only downside is that you don't get data before the first week. You can fill that in if you really want, but it doesn't seem very useful.

30 day rolling count of distinct IDs

So after looking at what seems to be a common question being asked and not being able to get any solution to work for me, I decided I should ask for myself.
I have a data set with two columns: session_start_time, uid
I am trying to generate a rolling 30 day tally of unique sessions
It is simple enough to query for the number of unique uids per day:
SELECT
COUNT(DISTINCT(uid))
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '30 days'
it is also relatively simple to calculate the daily unique uids over a date range.
SELECT
DATE_TRUNC('day',session_start_time) AS "date"
,COUNT(DISTINCT uid) AS "count"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date(session_start_time)
I then I tried several ways to do a rolling 30 day unique count over a time interval
SELECT
DATE(session_start_time) AS "running30day"
,COUNT(distinct(
case when date(session_start_time) >= running30day - interval '30 days'
AND date(session_start_time) <= running30day
then uid
end)
) AS "unique_30day"
FROM segment_clean.users_sessions
WHERE session_start_time >= CURRENT_DATE - interval '3 months'
GROUP BY date(session_start_time)
Order BY running30day desc
I really thought this would work but when looking into the results, it appears I'm getting the same results as I was when doing the daily unique rather than the unique over 30days.
I am writing this query from Metabase using the SQL query editor. the underlying tables are in redshift.
If you read this far, thank you, your time has value and I appreciate the fact that you have spent some of it to read my question.
EDIT:
As rightfully requested, I added an example of the data set I'm working with and the desired outcome.
+-----+-------------------------------+
| UID | SESSION_START_TIME |
+-----+-------------------------------+
| | |
| 10 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 5 | 2020-01-13T01:46:07.000-05:00 |
| | |
| 3 | 2020-01-18T02:49:23.000-05:00 |
| | |
| 9 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 2 | 2020-03-06T18:18:28.000-05:00 |
| | |
| 8 | 2020-03-31T23:13:33.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 2 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 9 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 3 | 2020-08-28T18:23:15.000-04:00 |
| | |
| 8 | 2020-09-15T16:40:29.000-04:00 |
| | |
| 3 | 2020-09-21T20:49:09.000-04:00 |
| | |
| 1 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 6 | 2020-11-05T21:31:48.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 8 | 2020-12-12T04:42:00.000-05:00 |
| | |
| 5 | 2020-12-12T04:42:00.000-05:00 |
+-----+-------------------------------+
bellow is what the result I would like looks like:
+------------+---------------------+
| DATE | UNIQUE 30 DAY COUNT |
+------------+---------------------+
| | |
| 2020-01-13 | 3 |
| | |
| 2020-01-18 | 1 |
| | |
| 2020-03-06 | 3 |
| | |
| 2020-03-31 | 1 |
| | |
| 2020-08-28 | 4 |
| | |
| 2020-09-15 | 2 |
| | |
| 2020-09-21 | 1 |
| | |
| 2020-11-05 | 2 |
| | |
| 2020-12-12 | 2 |
+------------+---------------------+
Thank you
You can approach this by keeping a counter of when users are counted and then uncounted -- 30 (or perhaps 31) days later. Then, determine the "islands" of being counted, and aggregate. This involves:
Unpivoting the data to have an "enters count" and "leaves" count for each session.
Accumulate the count so on each day for each user you know whether they are counted or not.
This defines "islands" of counting. Determine where the islands start and stop -- getting rid of all the detritus in-between.
Now you can simply do a cumulative sum on each date to determine the 30 day session.
In SQL, this looks like:
with t as (
select uid, date_trunc('day', session_start_time) as s_day, 1 as inc
from users_sessions
union all
select uid, date_trunc('day', session_start_time) + interval '31 day' as s_day, -1
from users_sessions
),
tt as ( -- increment the ins and outs to determine whether a uid is in or out on a given day
select uid, s_day, sum(inc) as day_inc,
sum(sum(inc)) over (partition by uid order by s_day rows between unbounded preceding and current row) as running_inc
from t
group by uid, s_day
),
ttt as ( -- find the beginning and end of the islands
select tt.uid, tt.s_day,
(case when running_inc > 0 then 1 else -1 end) as in_island
from (select tt.*,
lag(running_inc) over (partition by uid order by s_day) as prev_running_inc,
lead(running_inc) over (partition by uid order by s_day) as next_running_inc
from tt
) tt
where running_inc > 0 and (prev_running_inc = 0 or prev_running_inc is null) or
running_inc = 0 and (next_running_inc > 0 or next_running_inc is null)
)
select s_day,
sum(sum(in_island)) over (order by s_day rows between unbounded preceding and current row) as active_30
from ttt
group by s_day;
Here is a db<>fiddle.
I'm pretty sure the easier way to do this is to use a join. This creates a list of all the distinct users who had a session on each day and a list of all distinct dates in the data. Then it one-to-many joins the user list to the date list and counts the distinct users, the key here is the expanded join criteria that matches a range of dates to a single date via a system of inequalities.
with users as
(select
distinct uid,
date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01'),
dates as
(select
distinct date_trunc('day',session_start_time) AS dt
from <table>
where session_start_time >= '2021-05-01')
select
count(distinct uid),
dates.dt
from users
join
dates
on users.dt >= dates.dt - 29
and users.dt <= dates.dt
group by dates.dt
order by dt desc
;

How do I create cohorts of users from month of first order, then count information about those orders in SQL?

I'm trying to use SQL to:
Create user cohorts by the month of their first order
Sum the total of all the order amounts bought by that cohort all-time
Output the cohort name (its month), the cohort size (total users who made first purchase in that month), total_revenue (all order revenue from the users in that cohort), and avg_revenue (the total_revenue divided by the cohort size)
Please see below for a SQL Fiddle, with sample tables, and the expected output:
http://www.sqlfiddle.com/#!15/b5937
Thanks!!
Users Table
+-----+---------+
| id | name |
+-----+---------+
| 1 | Adam |
| 2 | Bob |
| 3 | Charles |
| 4 | David |
+-----+---------+
Orders Table
+----+--------------+-------+---------+
| id | date | total | user_id |
+----+--------------+-------+---------+
| 1 | '2020-01-01' | 100 | 1 |
| 2 | '2020-01-02' | 200 | 2 |
| 3 | '2020-03-01' | 300 | 3 |
| 4 | '2020-04-01' | 400 | 1 |
+----+--------------+-------+---------+
Desired Output
+--------------+--------------+----------------+-------------+
| cohort | cohort_size | total_revenue | avg_revenue |
+--------------+--------------+----------------+-------------+
| '2020-01-01' | 2 | 700 | 350 |
| '2020-03-01' | 1 | 300 | 300 |
+--------------+--------------+----------------+-------------+
You can find the minimum date for every user and aggregate for them. Then you can aggregate for every such date:
with first_orders(user_id, cohort, total) as (
select user_id, min(ordered_at), sum(total)
from orders
group by user_id
)
select to_char(date_trunc('month', fo.cohort), 'YYYY-MM-DD'), count(fo.user_id), sum(fo.total), avg(fo.total)
from first_orders fo
group by date_trunc('month', fo.cohort)
You can use window functions to get the first date. The rest is then aggregation:
select date_trunc('month', first_date) as yyyymm,
count(distinct user_id), sum(total), sum(total)/ count(distinct user_id)
from (select o.*, min(o.ordered_at) over (partition by o.user_id) as first_date
from orders o
) o
group by date_trunc('month', first_date)
order by yyyymm;
Here is the SQL fiddle.

postgresql - cumul. sum active customers by month (removing churn)

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.
Here is a sample of my customers table :
customer_id | begin_date | end_date
-----------------------------------------
1 | 15/09/2017 |
2 | 15/09/2017 |
3 | 19/09/2017 |
4 | 23/09/2017 |
5 | 27/09/2017 |
6 | 28/09/2017 | 15/10/2017
7 | 29/09/2017 | 16/10/2017
8 | 04/10/2017 |
9 | 04/10/2017 |
10 | 05/10/2017 |
11 | 07/10/2017 |
12 | 09/10/2017 |
13 | 11/10/2017 |
14 | 12/10/2017 |
15 | 14/10/2017 |
Here is what I am looking to achieve :
month | active customers
-----------------------------------------
2017-09 | 7
2017-10 | 6
I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.
select
"begin_date" as "date",
sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
select
date_trunc('month',begin_date)::date as "begin_date",
count(id) as new_customers
from customers
group by 1
) as new_customers
LEFT JOIN(
select
date_trunc('month',end_date)::date as "end_date",
count(id) as churn_customers
from customers
where
end_date is not null
group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;
You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join
SQL Fiddle
Query 1:
WITH edt
AS (
SELECT to_char(end_date, 'yyyy-mm') AS mon
,count(*) AS ct
FROM customers
WHERE end_date IS NOT NULL
GROUP BY to_char(end_date, 'yyyy-mm')
)
SELECT to_char(c.begin_date, 'yyyy-mm') as month
,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;
Results:
| month | active_customers |
|---------|------------------|
| 2017-09 | 7 |
| 2017-10 | 6 |

SQL query to select today and previous day's price

I have historic stock price data that looks like the below. I want to generate a new table that has one row for each ticker with the most recent day's price and its previous day's price. What would be the best way to do this? My database is Postgres.
+---------+------------+------------+
| ticker | price | date |
+---------+------------+------------|
| AAPL | 6 | 10-23-2015 |
| AAPL | 5 | 10-22-2015 |
| AAPL | 4 | 10-21-2015 |
| AXP | 5 | 10-23-2015 |
| AXP | 3 | 10-22-2015 |
| AXP | 5 | 10-21-2015 |
+------- +-------------+------------+
You can do something like this:
with ranking as (
select ticker, price, dt,
rank() over (partition by ticker order by dt desc) as rank
from stocks
)
select * from ranking where rank in (1,2);
Example: http://sqlfiddle.com/#!15/e45ea/3
Results for your example will look like this:
| ticker | price | dt | rank |
|--------|-------|---------------------------|------|
| AAPL | 6 | October, 23 2015 00:00:00 | 1 |
| AAPL | 5 | October, 22 2015 00:00:00 | 2 |
| AXP | 5 | October, 23 2015 00:00:00 | 1 |
| AXP | 3 | October, 22 2015 00:00:00 | 2 |
If your table is large and have performance issues, use a where to restrict the data to last 30 days or so.
Best bet is to use a window function with an aggregated case statement which is used to create a pivot on the data.
You can see more on window functions here: http://www.postgresql.org/docs/current/static/tutorial-window.html
Below is a pseudo code version of where you may need to head to answer your question (sorry I couldn't validate it due to not have a postgres database setup).
Select
ticker,
SUM(CASE WHEN rank = 1 THEN price ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price ELSE 0 END) yesterday
FROM (
SELECT
ticker,
price,
date,
rank() OVER (PARTITION BY ticker ORDER BY date DESC) as rank
FROM your_table) p
WHERE rank in (1,2)
GROUP BY ticker.
Edit - Updated the case statement with an 'else'