SQL monthly rolling sum - sql

I am trying to calculate monthly balances of bank accounts from the following postgresql table, containing transactions:
# \d transactions
View "public.transactions"
Column | Type | Collation | Nullable | Default
--------+------------------+-----------+----------+---------
year | double precision | | |
month | double precision | | |
bank | text | | |
amount | numeric | | |
In "rolling sum" I mean that the sum should contain the sum of all transactions until the end of the given month from the beginning of time, not just all transactions in thegiven month.
I came up with the following query:
select
a.year, a.month, a.bank,
(select sum(b.amount) from transactions b
where b.year < a.year
or (b.year = a.year and b.month <= a.month))
from
transactions a
order by
bank, year, month;
The problem is that this contains as many rows for each of the months for each banks as many transactions were there. If more, then more, if none, then none.
I would like a query which contains exactly one row for each bank and month for the whole time interval including the first and last transaction.
How to do that?
An example dataset and a query can be found at https://rextester.com/WJP53830 , courtesy of #a_horse_with_no_name

You need to generate a list of months first, then you can outer join your transactions table to that list.
with all_years as (
select y.year, m.month, b.bank
from generate_series(2010, 2019) as y(year) --<< adjust here for your desired range of years
cross join generate_series(1,12) as m(month)
cross join (select distinct bank from transactions) as b(bank)
)
select ay.*, sum(amount) over (partition by ay.bank order by ay.year, ay.month)
from all_years ay
left join transactions t on (ay.year, ay.month, ay.bank) = (t.year::int, t.month::int, t.bank)
order by bank, year, month;
The cross join with all banks is necessary so that the all_years CTE will also contain a bank for each month row.
Online example: https://rextester.com/ZZBVM16426

Here is my suggestion in Oracle 10 SQL:
select a.year,a.month,a.bank, (select sum(b.amount) from
(select a.year as year,a.month as month,a.bank as bank,
sum(a.amount) as amount from transactions c
group by a.year,a.month,a.bank
) b
where b.year<a.year or (b.year=a.year and b.month<=a.month))
from transactions a order by bank, year, month;

Consider aggregating all transactions first by bank and month, then run a window SUM() OVER() for rolling monthly sum since earliest amount.
WITH agg AS (
SELECT t.year, t.month, t.bank, SUM(t.amount) AS Sum_Amount
FROM transactions t
GROUP BY t.year, t.month, t.bank
)
SELECT agg.year, agg.month, agg.bank,
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank ORDER BY agg.year, agg.month) AS rolling_sum
FROM agg
ORDER BY agg.year, agg.month, agg.bank
Should you want YTD rolling sums, adjust the OVER() clause by adding year to partition:
SUM(agg.Sum_Amount) OVER (PARTITION BY agg.bank, agg.year ORDER BY agg.month)

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

Period over period SQL script?

I have a dataset of 2 columns: 'Date' and 'Total Sales'. My dates are 01-01-2021, 02-01-2021... so on and so forth up until 12-01-2022. I basically want to add another row where I have a "previous month" column that gives me the total sales for the previous month in the same row as the current month (else null) -- e.g. say I have 2 rows in my date column 01-01-2021 and 02-01-2021 and total sales would be $10 and $20 respectively. How do can I create a column that would show the following:
Date |Sales | Previous Month Sales|
---------------------------------------------
01-01-2021 | $10 | null
02-01-2021 | $20 | $10
So on and so forth; this is my query:
CASE
WHEN `Date` > DATE_SUB(`Date`, INTERVAL 1 MONTH)
THEN `Monthly Sales`
ELSE 'null'
END
Thanks in advance
Well, Domo's back-end is running a MySQL back-engine (from what I recall the last time I touched Domo [2018])
I think this is just a SQL question, and I wonder if a simple windowing function would do the trick.
select Date,
Sales,
max (case when *month* = *this month -1* then Sales else null end) over (order by 1) as "Previous Month Sales"
from table
You just need to figure out how to break down the Date into the month based on whatever SQL dialect Domo uses nowadays.
Cheers
I think domo support MySQL-like language, so you could do something like this:
with cte as
(
select date,
date + interval 1 month as next_month,
sales
from sales
)
select a.date,
a.sales as current_sales,
b.sales as prior_month_sales
from sales a
left join cte b
on b.next_month = a.date
order by a.date
I do this by joining the table onto itself with a LEFT OUTER JOIN. The outer join allows you to keep the null value for previous month. You match the date such that 1 column is calculated to show the previous month (I do this with EOMONTH() to ensure I always get the previous month and account for the year, if say it is January).
IF OBJECT_ID('TEMPDB..#TEMP') IS NOT NULL
DROP TABLE #TEMP
CREATE TABLE #TEMP(
[Date] DATE
,[Sales] INT
)
INSERT INTO #TEMP([Date],[Sales])
VALUES ('2020-12-20',50)
,('2021-01-20',100)
,('2021-02-20',200)
,('2021-03-20',300)
,('2021-04-20',400)
,('2021-05-20',500)
SELECT #TEMP.[Date]
,#TEMP.Sales
,TEMPII.Date [PREV M]
,TEMPII.Sales [PREV M SALES]
FROM #TEMP
LEFT OUTER JOIN #TEMP TEMPII
ON YEAR(EOMONTH(#TEMP.[Date],-1))*100+MONTH(EOMONTH(#TEMP.[Date],-1)) = YEAR(TEMPII.[Date])*100+MONTH(TEMPII.[Date])
ORDER BY #TEMP.[Date]
Output:

Compare two dates and compute date difference

I have to tables to compare:
Schedule table:
Official receipt table:
I just want to know if the client is paying according to his schedule by comparing the dates between Official receipt and Schedule table. if not, it will give him a penalty of $10 daily by counting the days from scheduled date.
example: the 1st schedule of payment is 2019-11-02. but the OR shows he paid on 2019-12-10. which is 38 days later than his 1st payment schedule. then penalty will be imposed. Any Idea? Thank you.
I want something like this:
Loanid | PaymentSched | Date OR | Past Due | Penalty
H1807.0008 | 2019-11-02 | 2019-12-10 | 38 Days | 380
Assuming that there is no missing payment and no partial payment, then one option is to enumerate the scheduled payments and receipts with row_number(), then join them together. The rest is is just filtering on late payment and computing the days late and penalty:
select
s.loan_id,
s.date_payment,
r.date_or,
datediff(day, s.date_payment, r.date_or) as past_due_days,
10 * datediff(day, s.date_payment, r.date_or) as penalty
from (
select s.*, row_number() over(partition by loan_id order by date_payment) rn
from schedule s
where total_payment > 0
) s
inner join (
select r.*, row_number() over(partition by loan_id order by date_or) rn
from official_receipt r
) r on s.loan_id = r.loan_id and s.rn = r.rn and s.total_payment = r.amount
where r.date_or > s.date_payment
datediff will help you
select datediff(day,'2019-10-02','2019-12-10')
select [Loanid],[PaymentSched],[Date OR],datediff(day,[PaymentSched],[Date OR]) as [Past Due],datediff(day,[PaymentSched],[Date OR])*10 as Penalty
from Schedule s
join [Official receipt] o on o.[Loanid]=s.[Loanid]

Using Over & Partition in SQL

I have a SQL table with monthly prices for products from different categories (e.g. fruits, vegetables, dairy).
I'd like to calculate a running monthly average for a specific category and for all the products in the same month, in the same query.
So combining:
Select date, avg(price) group by date where category = 'Fruit'
Select date, avg(price) group by date
Is that possible to do using OVER & Partition (or any other way for that matter)
Edit
I am using MS SQL
My data is monthly, so I don't need to extract month end dates -I can just group on date then I will get month end data
As an example, if my table looks like this:
|Date| Item | Category |Price |
|Jan |Banana | Fruit | 10|
|Jan |Potato | Veg | 20 |
Then the output would be
Date | Fruit Avg | Overall Avg |
Jan | 10 | 15
Apologies in advance for mangling the tables, but that's for another thread.
Thanks
why you need over?
Select date, category , avg(price) group by date, category
Try to use
Select extract(year from date)||extract(month from date) as month,
category, avg(price) as avg_price
from sales
group by month, category
P.S. using alias month may not be allowed for some database systems, in that case replace month with extract(year from date)||extract(month from date) in the group by list. Also concatenation operator may differ, replace pipes( || ) with a plus sign ( + ) whenever such a situation is met.
If you want one row per month and the data has multiple rows, then you need to aggregate:
select date_trunc('month', date) as month,
avg(price) as avg_per_month,
avg(avg(price)) over (order by date_trunc('month', date))
from t
where category = 'Fruit'
group by date_trunc('month', date)
order by date_trunc('month', date);
If you only have one row per month, then you can do:
select date, price,
avg(price) over (order by date)
from t
where category = 'Fruit'
order by date;

Can I limit the amount of rows to be used for a group in a GROUP BY statement

I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId