How to use SUM() OVER (partition by)? - sql

Imagine, from 1st to 3rd november you have sold a certain amount of goods (there are two types A and B), and now you need to determine how much was sold in total for the day.
How can I query last 2 columns (sum and quantity for date) that my table looks like this?:
Date Type Quantity Amount Sum_Quantity Sum_Amount
01-11 A 2 100 5 300
01-11 B 3 200 5 300
02-11 A 1 700 3 950
02-11 B 2 250 3 950
03-11 A 2 600 7 800
03-11 B 5 200 7 800
And how can I query, if I want to take the results partitioned by month?

SELECT date,
type,
quantity,
amount,
-- Partition by date
SUM(quantity) OVER (PARTITION BY date) AS sum_quantity_date_part,
SUM(amount) OVER (PARTITION BY date) AS sum_amount_date_part,
-- Partition by month
SUM(quantity) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_quantity_month_part,
SUM(amount) OVER (
PARTITION BY EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date)
) AS sum_amount_month_part
FROM sales
ORDER BY date, type
;

Related

Identifying who spent more than a certain amount within any 30 day period?

I have a table that lists each customer's transactions along with the date they occurred and how much was spent. What I want to do is get a list of all customers who spent £3k or more within any 30-day period.
I can get a list of who spent £3k or more within the last 30 days using the code below, but I'm not sure how to adapt this to cover any 30-day period. Any help would be appreciated please!
select *
from
(
select customer_id, sum(spend) as total_spend
from transaction_table
where transaction_date between (current date - 30 days) and current date
group by customer_id
)
where total_spend >=3000
;
Try the following.
The idea is to calculate running sum of SPEND for last 30 days for each row.
WITH TRANSACTION_TABLE (CUSTOMER_ID, TRANSACTION_DATE, SPEND) AS
(
VALUES
(1, DATE ('2021-01-01'), 1000)
, (1, DATE ('2021-01-31'), 2000)
--, (1, DATE ('2021-02-01'), 2000)
)
SELECT DISTINCT CUSTOMER_ID
FROM
(
SELECT
CUSTOMER_ID
--, TRANSACTION_DATE, SPEND
, SUM (SPEND) OVER (PARTITION BY CUSTOMER_ID ORDER BY DAYS (TRANSACTION_DATE) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) AS SPEND_RTOTAL
FROM TRANSACTION_TABLE
)
WHERE SPEND_RTOTAL >= 3000
You can use SUM() with a window function and a window frame of 30. For example:
select *
from (
select t.*,
sum(t.spent) over(
partition by customer_id
order by julian_day(transaction_date)
range between 30 preceding and current row
) as total_spend
from transaction_table t
) x
where total_spend >= 3000
For the data set:
CUSTOMER_ID TRANSACTION_DATE SPENT
------------ ----------------- -----
1 2021-10-01 2000
1 2021-10-15 1500
1 2021-12-01 1000
2 2021-11-01 2500
Result:
CUSTOMER_ID TRANSACTION_DATE SPENT TOTAL_SPEND
------------ ----------------- ------ -----------
1 2021-10-15 1500 3500
See running example at db<>fiddle.

Sales amounts of the top n selling vendors by month in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price
x 2021-07-08 23:41:10 451,5
y 2021-06-14 10:22:10 41,7
z 2020-01-03 13:41:12 74
s 2020-04-12 01:14:58 88
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month sum_of_only_top20_vendor's_sales
2020-01 7857
2020-02 9685
2020-03 3574
2020-04 7421
.....
Consider below approach
select month, sum(sale) as sum_of_only_top20_vendor_sales
from (
select vendor,
format_datetime('%Y%m', date) month,
sum(item_price) as sale
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sale desc) <= 20
)
group by month
Another solution that potentially can show much much better performance on really big data:
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t
or with a little refactoring
select month, sum(sum) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors
from your_table
group by month
) t, t.top_20_vendors
group by month

SQL Cumulative Sum by Group by time condition

I have a table with columns dummy_id, date_registered, item_id, quantity, price, like this:
dummy_id
date_registered
item_id
quantity
price
my_cumulative
1
2013-07-01
100
10
34.5
10
2
2013-07-01
145
8
2.3
8
3
2013-07-11
100
20
34.5
30
4
2013-07-23
100
15
34.5
45
5
2013-07-24
145
10
34.5
18
And if I want to calculate column my_cumulative which contains the cumulative totals of each item_id order by date_registered I use this code:
select dummy_id, date_registered, item_id, quantity, price,
sum(quantity) over (partition by item_id order by date_registered) as cumulative
from table t;
And it works well. But what if I now want to my_cumulative column for each row to count only orders from the last month? (calculate the sum of the quantity only for rows where the date_register column is less than the current one, no more than a month)
Is there any way to do this in sql? (prefer postgresql)
If you want cumulative quantities for the current month -- which is what I suspect you want, then change the partition by:
select dummy_id, date_registered, item_id, quantity, price,
sum(quantity) over (partition by item_id, date_trunc('month', date_registered) order by date_registered) as cumulative
from table t;
If you really want the last month, then use a range window frame with interval:
select dummy_id, date_registered, item_id, quantity, price,
sum(quantity) over (partition by item_id
order by date_registered
range between interval '1 month' preceding and current row
) as cumulative
from table t;
The first seems much more useful to me.

How to do a Min and Max of date but following the changes in price points

I'm not really sure how to word this question better so I'll provide the data that I have and the result that I'm after.
This is the data that I have
sku sales qty date
A 100 1 1-Jan-19
A 200 2 2-Jan-19
A 100 1 3-Jan-19
A 240 2 4-Jan-19
A 360 3 5-Jan-19
A 360 4 6-Jan-19
A 200 2 7-Jan-19
A 90 1 8-Jan-19
B 100 1 9-Jan-19
B 200 2 10-Jan-19
And this is the result that I'm after
sku price sum(qty) sum(sales) min(date) max(date)
A 100 4 400 1-Jan-19 3-Jan-19
A 120 5 600 4-Jan-19 5-Jan-19
A 90 4 360 6-Jan-19 6-Jan-19
A 100 2 200 7-Jan-19 7-Jan-19
A 90 1 90 8-Jan-19 8-Jan-19
B 100 3 300 9-Jan-19 10-Jan-19
As you can see, I'm trying to get the min and max date of each price point, where price = sales/qty. At this point, I can get the min and max date of the same price but I can separate it when there's another price in between. I think I have to use some sort of min(date) over (partition by sales/qty order by date) but I can't figure it out yet.
I'm using Redshift SQL
This is a gaps-and-islands query. You can do this by generating a sequence and subtracting that from the date. Then aggregate:
select sku, price, sum(qty), sum(sales),
min(date), max(date)
from (select t.*,
row_number() over (partition by sku, price order by date) as seqnum
from t
) t
group by sku, price, (date - seqnum * interval '1 day')
order by sku, price, min(date);
You can do with Sub Query and LAG
FIDDLE DEMO
SELECT SKU, Price, SUM(Qty) SumQty, SUM(Sales) SumSales, MIN(date) MinDate, MAX(date) MaxDate
FROM (
SELECT SKU,Price,SUM(is_change) OVER(order by SKU, date) is_change,Sales, Qty,date
FROM (SELECT SKU, Sales/Qty AS Price, Sales, Qty,date,
CASE WHEN Sales/Qty = lag(Sales/Qty) over (order by SKU, date)
and SKU = lag(SKU) OVER (order by SKU, date) then 0 ELSE 1 END AS is_change
FROM Tbl
)InnerSelect
) X GROUP BY sku, price,is_change
ORDER BY SKU,MIN(date)
Output

Compare between values from the same table in postgresql

I have the following table:
id partid orderdate qty price
1 10 01/01/2017 10 3
2 10 02/01/2017 5 9
3 11 01/01/2017 0.5 0.001
4 145 02/01/2017 5 18
5 10 12/12/2016 8 7
6 10 05/07/2010 81 7.5
Basically I want to compare the most recent purchasing of parts to the other purchasing of the same part in a period of 24 months. For that matter compare id=2 to id = 1,5.
I want to check if the price of the latest orderdate (per part) is larger than the average price of that part in the last 24 months.
So first I need to calculate the avg price:
partid avgprice
10 (3+9+7)/3=6.33 (7.5 is out of range)
11 0.001
145 18
I also need to know the latest orderdate of each part:
id partid
2 10
3 11
4 145
and then I need to check if id=2, id=3, id=6 (latest purchases) are bigger than the average. If they are I need to return their partid.
So I should have something like this:
id partid avgprice lastprice
2 10 6.33 9
3 11 0.001 0.001
4 145 18 18
Finally I need to return partid=10 since 9>6.33
Now to my questions...
I'm not sure how I can find the latest order in PostgreSQL.
I tried:
select id, distinct partid,orderdate
from table
where orderdate> current_date - interval '24 months'
order by orderdate desc
This gives :
ERROR: syntax error at or near "distinct".
I'm a bit of a lost here. I know what I want to do but I cant translate it to SQL. Any one can help?
Get the avarage per part and the last order per price and join these:
select
lastorder.id,
lastorder.partid,
lastorder.orderdate,
lastorder.price as lastprice,
avgorder.price as avgprice
from
(
select
partid,
avg(price) as price
from mytable
where orderdate >= current_date - interval '24 months'
group by partid
) avgorder
join
(
select distinct on (partid)
id,
partid,
orderdate,
price
from mytable
order by partid, orderdate desc
) lastorder on lastorder.partid = avgorder.partid
and lastorder.price > avgorder.price;
This can be solved without distinct (which is heavy on the DB anyways):
with avg_price as (
select partid, avg(price) as price
from table
where orderdate> current_date - interval '24 months'
group by partid
)
select f.id, f.partid, av.price, f.price
from (
select id, partid, orderdate, price, rank() over (partition by partid order by orderdate desc)
from table
) as f
join avg_price av on f.partid = av.partid
where f.rank = 1
and av.price < f.price