How to group by week in postgresql - sql

I've a database table commits with the following columns:
id | author_name | author_email | author_date (timestamp) |
total_lines
Sample contents are:
1 | abc | abc#xyz.com | 2013-03-24 15:32:49 | 1234
2 | abc | abc#xyz.com | 2013-03-27 15:32:49 | 534
3 | abc | abc#xyz.com | 2014-05-24 15:32:49 | 2344
4 | abc | abc#xyz.com | 2014-05-28 15:32:49 | 7623
I want to get a result as follows:
id | name | week | commits
1 | abc | 1 | 2
2 | abc | 2 | 0
I searched online for similar solutions but couldnt get any helpful ones.
I tried this query:
SELECT date_part('week', author_date::date) AS weekly,
COUNT(author_email)
FROM commits
GROUP BY weekly
ORDER BY weekly
But its not the right result.

If you have multiple years, you should take the year into account as well. One way is:
SELECT date_part('year', author_date::date) as year,
date_part('week', author_date::date) AS weekly,
COUNT(author_email)
FROM commits
GROUP BY year, weekly
ORDER BY year, weekly;
A more natural way to write this uses date_trunc():
SELECT date_trunc('week', author_date::date) AS weekly,
COUNT(author_email)
FROM commits
GROUP BY weekly
ORDER BY weekly;

If you want the count of all the intermediate weeks as well where there are no commits/records, you can get it by providing a start_date and end_date to generate_series() function
SELECT t1.year_week week,
t2.commit_count
FROM (SELECT week,
To_char(week, 'IYYY-IW') year_week
FROM generate_series('2020-02-01 06:06:51.25+00'::DATE,
'2020-04-05 12:12:33.25+00'::
DATE, '1 week'::interval) AS week) t1
LEFT OUTER JOIN (SELECT To_char(author_date, 'IYYY-IW') year_week,
COUNT(author_email) commit_count
FROM commits
GROUP BY year_week) t2
ON t1.year_week = t2.year_week;
The output will be:
week | commit_count
----------+-------------
2020-05 | 2
2020-06 | NULL
2020-07 | 1

Related

How do I create cohorts of users from month of first order, then count information about those orders in SQL?

I'm trying to use SQL to:
Create user cohorts by the month of their first order
Sum the total of all the order amounts bought by that cohort all-time
Output the cohort name (its month), the cohort size (total users who made first purchase in that month), total_revenue (all order revenue from the users in that cohort), and avg_revenue (the total_revenue divided by the cohort size)
Please see below for a SQL Fiddle, with sample tables, and the expected output:
http://www.sqlfiddle.com/#!15/b5937
Thanks!!
Users Table
+-----+---------+
| id | name |
+-----+---------+
| 1 | Adam |
| 2 | Bob |
| 3 | Charles |
| 4 | David |
+-----+---------+
Orders Table
+----+--------------+-------+---------+
| id | date | total | user_id |
+----+--------------+-------+---------+
| 1 | '2020-01-01' | 100 | 1 |
| 2 | '2020-01-02' | 200 | 2 |
| 3 | '2020-03-01' | 300 | 3 |
| 4 | '2020-04-01' | 400 | 1 |
+----+--------------+-------+---------+
Desired Output
+--------------+--------------+----------------+-------------+
| cohort | cohort_size | total_revenue | avg_revenue |
+--------------+--------------+----------------+-------------+
| '2020-01-01' | 2 | 700 | 350 |
| '2020-03-01' | 1 | 300 | 300 |
+--------------+--------------+----------------+-------------+
You can find the minimum date for every user and aggregate for them. Then you can aggregate for every such date:
with first_orders(user_id, cohort, total) as (
select user_id, min(ordered_at), sum(total)
from orders
group by user_id
)
select to_char(date_trunc('month', fo.cohort), 'YYYY-MM-DD'), count(fo.user_id), sum(fo.total), avg(fo.total)
from first_orders fo
group by date_trunc('month', fo.cohort)
You can use window functions to get the first date. The rest is then aggregation:
select date_trunc('month', first_date) as yyyymm,
count(distinct user_id), sum(total), sum(total)/ count(distinct user_id)
from (select o.*, min(o.ordered_at) over (partition by o.user_id) as first_date
from orders o
) o
group by date_trunc('month', first_date)
order by yyyymm;
Here is the SQL fiddle.

How can I insert data in table form into another table provided some specific conditions are satisfied

Logic: If today is Monday (reference 'time' table), data present in S should be inserted into M (along with a sent_day column which will have today's date).
If today is not Monday, dates corresponding to current week (unique week_id) should be checked in M table. If any of these dates are available in M then S should not be inserted into M. If these dates are not available in M then S should be inserted into M
time
+------------+------------+----------------+
| cal_dt | cal_day | week_id |
+------------+------------+----------------+
| 2020-03-23 | Monday | 123 |
| 2020-03-24 | Tuesday | 123 |
| 2020-03-25 | Wednesday | 123 |
| 2020-03-26 | Thursday | 123 |
| 2020-03-27 | Friday | 123 |
| 2020-03-30 | Monday | 124 |
| 2020-03-31 | Tueday | 124 |
+------------+------------+----------------+
M
+------------+----------+-------+
| sent_day | item | price |
+------------+----------+-------+
| 2020-03-11 | pen | 10 |
| 2020-03-11 | book | 50 |
| 2020-03-13 | Eraser | 5 |
| 2020-03-13 | sharpner | 5 |
+------------+----------+-------+
S
+----------+-------+
| item | price |
+----------+-------+
| pen | 25 |
| book | 20 |
| Eraser | 10 |
| sharpner | 3 |
+----------+-------+
Insert INTO M
SELECT
CASE WHEN(SELECT cal_day FROM time WHERE cal_dt = current_date) = 'Monday' THEN s.*
ELSE
(CASE WHEN(SELECT cal_dt FROM time WHERE wk_id =(SELECT wk_id FROM time WHERE cal_dt = current_date ) NOT IN (SELECT DISTINCT sent_day FROM M) THEN 1 ELSE 0 END)
THEN s.* ELSE END
FROM s
I would do this in two separate INSERT statements:
The first condition ("if today is monday") is quite easy:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where exists (select *
from "time"
where cal_dt = current_date
and cal_day = 'Monday');
I find storing the date and the week day a bit confusing as the week day can easily be extracted from the day. For the test "if today is Monday" it's actually not necessary to consult the "time" table at all:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1;
The second part is a bit more complicated, but if I understand it correctly, it should be something like this:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));
If you just want a single INSERT statement, you could simply do a UNION ALL between the two selects:
insert into m (sent_day, item, price)
select current_date, item, price
from s
where extract(dow from current_date) = 1
union all
select current_date, item, price
from s
where not exists (select *
from m
where m.sent_day in (select cal_dt
from "time" t
where cal_dt = current_date
and cal_day <> 'Monday'));

How does DATEADD work when joining the same table with itself?

I have a table with monthly production values.
Example:
Outdate | Prod Value | ID
2/28/19 | 110 | 4180
3/31/19 | 100 | 4180
4/30/19 | 90 | 4180
I also have a table that has monthly forecast values.
Example:
Forecast End Date | Forecast Value | ID
2/28/19 | 120 | 4180
3/31/19 | 105 | 4180
4/30/19 | 80 | 4180
I want to create a table that has a row that contains the ID, the Prod Value, the current month (example: March) forecast, the previous month forecast, the next month forecast.
What I want:
ID | Prod Value | Outdate | Current Forecast | Previous Forecast | Next Forecast
4180 | 100 | 3/31/19 | 105 | 120 | 80
The problem is that when I used DATEADD to bring in the specific value from the Forecast table for the previous month, random months are missing from my final values.
I've tried adding in another LEFT JOIN / INNER JOIN with the DateDimension table when adding in the Next Month and Previous Month forecast, but that either does not solve the problem or adds in too many rows.
My DateDimension table that has these columns: DateKey
Date, Day, DaySuffix, Weekday, WeekDayName, IsWeekend, IsHoliday, DOWInMonth, DayOfYear, WeekOfMonth, WeekOfYear, ISOWeekOfYear, Month, MonthName, Quarter, QuarterName, Year, MMYYYY, MonthYear, FirstDayOfMonth, LastDayOfMonth, FirstDayOfQuarter, LastDayOfQuarter, FirstDayOfYear, LastDayOfYear, FirstDayOfNextMonth, FirstDayOfNextYear
My query is along these lines (abbreviated for simplicity)
SELECT A.ArchiveKey, BH.ID, d.[Date], BH.Outdate, BH.ProdValue, BH.Forecast, BHP.Forecast, BHN.Foreceast
FROM dbo.BudgetHistory bh
INNER JOIN dbo.DateDimension d ON bh.outdate = d.lastdayofmonth
INNER JOIN dbo.Archive a ON bh.ArchiveKey = a.ArchiveKey
LEFT JOIN dbo.BudgetHistory bhp ON bh.ID = bhp.ID AND bhp.outdate = DATEADD(m, - 1, bh.Outdate)
LEFT JOIN dbo.BudgetHistory bhn ON bh.ID = bhn.ID AND bhn.outdate = DATEADD(m, 1, bh.Outdate)
WHERE bh.ID IS NOT NULL
I get something like this:
+------+------------+---------+------------------+-------------------+---------------+
| ID | Prod Value | Outdate | Current Forecast | Previous Forecast | Next Forecast |
+------+------------+---------+------------------+-------------------+---------------+
| 4180 | 110 | 2/28/19 | 120 | NULL | NULL |
| 4180 | 100 | 3/31/19 | 105 | 120 | 80 |
| 4180 | 90 | 4/30/19 | 80 | NULL | NULL |
+------+------------+---------+------------------+-------------------+---------------+
And the pattern doesn't seem to follow anything reasonable.
I want the values to be filled in for each row.
You could join the tables, then use window functions LEAD() and LAG() to recover the next and previous forecast values:
SELECT
p.ID,
p.ProdValue,
p.Outdate,
f.ForecastValue,
LAG(f.ForecastValue) OVER(PARTITION BY f.ID ORDER BY f.ForecastEndDate) PreviousForecast,
LEAD(f.ForecastValue) OVER(PARTITION BY f.ID ORDER BY f.ForecastEndDate) NextForecast
FROM prod p
INNER JOIN forecast f ON p.ID = f.ID AND p.Outdate = f.ForecastEndDate
This demo on DB Fiddle with your sample data returns:
ID | ProdValue | Outdate | ForecastValue | PreviousForecast | NextForecast
---: | --------: | :------------------ | ------------: | ---------------: | -----------:
4180 | 110 | 28/02/2019 00:00:00 | 120 | null | 105
4180 | 100 | 31/03/2019 00:00:00 | 105 | 120 | 80
4180 | 90 | 30/04/2019 00:00:00 | 80 | 105 | null
DATEADD only does end of month adjustments if the newly calculated value isn't a valid date. So DATEADD(month,-1,'20190331') produces 28th February. But DATEADD(month,-1,'20190228') produces 28th January, not the 31st.
I would probably go with GMB's answer. If you want to do something DATEADD based though, you can use:
bhp.outdate = DATEADD(month, DATEDIFF(month,'20010131', bh.Outdate) ,'20001231')
This always works out the last day of the previous month from bh.OutDate, but it does it by computing it as an offset from a fixed date, and then applying that offset to a different fixed date.
You can just reverse the places of 20010131 and 20001231 to compute the month after rather than the month before. There's no significance about them other than them both having 31 days and having the "one month apart" relationship we're wishing to apply.

postgresql - cumul. sum active customers by month (removing churn)

I want to create a query to get the cumulative sum by month of our active customers. The tricky thing here is that (unfortunately) some customers churn and so I need to remove them from the cumulative sum on the month they leave us.
Here is a sample of my customers table :
customer_id | begin_date | end_date
-----------------------------------------
1 | 15/09/2017 |
2 | 15/09/2017 |
3 | 19/09/2017 |
4 | 23/09/2017 |
5 | 27/09/2017 |
6 | 28/09/2017 | 15/10/2017
7 | 29/09/2017 | 16/10/2017
8 | 04/10/2017 |
9 | 04/10/2017 |
10 | 05/10/2017 |
11 | 07/10/2017 |
12 | 09/10/2017 |
13 | 11/10/2017 |
14 | 12/10/2017 |
15 | 14/10/2017 |
Here is what I am looking to achieve :
month | active customers
-----------------------------------------
2017-09 | 7
2017-10 | 6
I've managed to achieve it with the following query ... However, I'd like to know if there are a better way.
select
"begin_date" as "date",
sum((new_customers.new_customers-COALESCE(churn_customers.churn_customers,0))) OVER (ORDER BY new_customers."begin_date") as active_customers
FROM (
select
date_trunc('month',begin_date)::date as "begin_date",
count(id) as new_customers
from customers
group by 1
) as new_customers
LEFT JOIN(
select
date_trunc('month',end_date)::date as "end_date",
count(id) as churn_customers
from customers
where
end_date is not null
group by 1
) as churn_customers on new_customers."begin_date" = churn_customers."end_date"
order by 1
;
You may use a CTE to compute the total end_dates and then subtract it from the counts of start dates by using a left join
SQL Fiddle
Query 1:
WITH edt
AS (
SELECT to_char(end_date, 'yyyy-mm') AS mon
,count(*) AS ct
FROM customers
WHERE end_date IS NOT NULL
GROUP BY to_char(end_date, 'yyyy-mm')
)
SELECT to_char(c.begin_date, 'yyyy-mm') as month
,COUNT(*) - MAX(COALESCE(ct, 0)) AS active_customers
FROM customers c
LEFT JOIN edt ON to_char(c.begin_date, 'yyyy-mm') = edt.mon
GROUP BY to_char(begin_date, 'yyyy-mm')
ORDER BY month;
Results:
| month | active_customers |
|---------|------------------|
| 2017-09 | 7 |
| 2017-10 | 6 |

SQL - Count number of transactions if at least one of the transactions is today

I have a database of transactions made by customers such that each transaction has a specific date. I need to count the number of transactions made by each customer in the last two months ONLY if there was a transaction made by the customer today.
I have been thinking that it requires me to use WHERE to set the complete range of two months and another HAVING statement to make sure the newest date (MAX of that customers transactions) is equal to today's date but I cannot seem to get it to work. Does this sound like the correct way to be going about this problem or is there a better way?
Thank you!
You don't provide any information about how your schema is, but I assume you have a Customer table and a Transaction table. Consider this example with 4 customers and 12 transactions.
Customers
| id | name |
|----|----------|
| 1 | Google |
| 2 | Facebook |
| 3 | Hooli |
| 4 | Yahoo! |
Transactions
| id | transaction_date | customer_id |
|----|------------------|-------------|
| 1 | 2017-04-15 | 1 |
| 2 | 2017-06-24 | 1 |
| 3 | 2017-07-09 | 1 |
| 4 | 2017-07-24 | 1 |
| 5 | 2017-07-23 | 2 |
| 6 | 2017-07-22 | 2 |
| 7 | 2017-07-21 | 2 |
| 8 | 2017-07-24 | 2 |
| 9 | 2017-07-24 | 3 |
| 10 | 2017-07-23 | 4 |
| 11 | 2017-07-22 | 4 |
| 12 | 2017-07-21 | 4 |
To count the number of transactions the last two months by each customer a simple group by will do the job:
select name, count(*) as number_of_transactions
from transactions t
inner join customers c on c.id = t.customer_id
where t.transaction_date > dateadd(month, -2, getdate())
group by c.name
This yields
| name | number_of_transactions |
|----------|------------------------|
| Facebook | 4 |
| Google | 3 |
| Hooli | 1 |
| Yahoo! | 3 |
To retrieve only customers that have a transaction with a transaction_date equal to today we can use an exists to check if such a row exist.
select name, count(*) as number_of_transactions
from transactions t
inner join customers c on c.id = t.customer_id
where t.transaction_date > dateadd(month, -2, getdate())
and exists(select *
from transactions
where customer_id = t.customer_id
and transaction_date = convert(date, getdate()))
group by c.name
So, if a row in the transaction table that has a transaction_date equal to today and the customer_id is equal to the customer_id from the main query include it in the result. Running that query (given that 24th July is today) gives us this result:
| name | number_of_transactions |
|----------|------------------------|
| Facebook | 4 |
| Google | 3 |
| Hooli | 1 |
Check out this sql fiddle http://sqlfiddle.com/#!6/710c94/13
You can toss a subquery in your WHERE clause to find customers that have had sales today:
SELECT count(*) /*count of transactions*/
FROM transactions
WHERE
/*Transactions in the last two months*/
transaction_date > DATEADD(mm, -2, GETDATE())
/*For customers that have had a sale today*/
customer_number in (SELECT customer_number FROM transactions WHERE transaction_date = GETDATE());
Totally guessing at your table structure, table name, and field names, but this should get you close.
Alternatively, you can try to do a inner join:
SELECT t2.CustomerID,count(*) as TransactionsCount
FROM [Tansaction] t1 INNER JOIN [Tansaction] t2
ON t1.CustomerID= t2.CustomerID
WHERE CONVERT(date,t1.TransactionDateTime) = CONVERT(date,GETDATE())
AND t2.TransactionDateTime>= DATEADD(mm, -2, GETDATE())
GROUP BY t2.CustomerID
First, you would need to get the list of customers that had made a transaction today. I'm assuming you have a 'transactiontable' that contains transaction dates and customer details.
Do a select from this transactiontable using the following method:
Select count of distinct(transactiondate), Customer
from Transactiontable
where transactiondate > dateadd(months,-2, getdate())
and customer in (select customer from transactiontable
`where cast(transactiondate as date) = cast(getdate() as date))