Calculate percentage change of price based on Category with SQL - sql

I am writing a Query with SQL and couldn't figure it out yet...
My table looks like this:
Category Price Date
Cat1 20 2019-04
Cat2 12 2019-04
Cat3 5 2019-04
Cat1 23 2020-04
Cat2 17 2020-04
Cat3 8 2020-04
I would like to get a table that shows this:
Cat Pct_change Period
Cat 1 0.15 2019-2020
Cat 2 0.41 "
And so on.
I can get this category by category but I have like 100 categories, cant do this manually. It would be great, too, to see both prices side by side. What I don't (can't) allow is to generate new tables just saving the data to join separate tables...
Thank you!!

Use LEAD() window function to get the price and date of the next date for each category:
SELECT Category,
ROUND(1.0 * next_price / Price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY Category ORDER BY Date) next_price,
LEAD(Date) OVER (PARTITION BY Category ORDER BY Date) next_date
FROM tablename
)
WHERE next_date IS NOT NULL
See the demo.
Results:
Category
Pct_change
Period
Cat1
0.15
2019-2020
Cat2
0.42
2019-2020
Cat3
0.6
2019-2020

You can use first_value():
select distinct category, min(date), max(date),
(-1 + first_value(price) over (partition by category order by date desc) /
first_value(price) over (partition by category order by date asc)
) as percent_change
from t;

Related

Calculating average time between customer orders and average order value in Postgres

In PostgreSQL I have an orders table that represents orders made by customers of a store:
SELECT * FROM orders
order_id
customer_id
value
created_at
1
1
188.01
2020-11-24
2
2
25.74
2022-10-13
3
1
159.64
2022-09-23
4
1
201.41
2022-04-01
5
3
357.80
2022-09-05
6
2
386.72
2022-02-16
7
1
200.00
2022-01-16
8
1
19.99
2020-02-20
For a specified time range (e.g. 2022-01-01 to 2022-12-31), I need to find the following:
Average 1st order value
Average 2nd order value
Average 3rd order value
Average 4th order value
E.g. the 1st purchases for each customer are:
for customer_id 1, order_id 8 is their first purchase
customer 2, order 6
customer 3, order 5
So, the 1st-purchase average order value is (19.99 + 386.72 + 357.80) / 3 = $254.84
This needs to be found for the 2nd, 3rd and 4th purchases also.
I also need to find the average time between purchases:
order 1 to order 2
order 2 to order 3
order 3 to order 4
The final result would ideally look something like this:
order_number
AOV
av_days_since_last_order
1
254.84
0
2
300.00
28
3
322.22
21
4
350.00
20
Note that average days since last order for order 1 would always be 0 as it's the 1st purchase.
Thanks.
select order_number
,round(avg(value),2) as AOV
,coalesce(round(avg(days_between_orders),0),0) as av_days_since_last_order
from
(
select *
,row_number() over(partition by customer_id order by created_at) as order_number
,created_at - lag(created_at) over(partition by customer_id order by created_at) as days_between_orders
from t
) t
where created_at between '2022-01-01' and '2022-12-31'
group by order_number
order by order_number
order_number
aov
av_days_since_last_order
1
372.26
0
2
25.74
239
3
200.00
418
4
201.41
75
5
159.64
175
Fiddle
Im suppose it should be something like this
WITH prep_data AS (
SELECT order_id,
cuntomer_id,
ROW_NUMBER() OVER(PARTITION BY order_id, cuntomer_id ORDER BY created_at) AS pushcase_num,
created_at,
value
FROM pushcases
WHERE created_at BETWEEN :date_from AND :date_to
), prep_data2 AS (
SELECT pd1.order_id,
pd1.cuntomer_id,
pd1.pushcase_num
pd2.created_at - pd1.created_at AS date_diff,
pd1.value
FROM prep_data pd1
LEFT JOIN prep_data pd2 ON (pd1.order_id = pd2.order_id AND pd1.cuntomer_id = pd2.cuntomer_id AND pd1.pushcase_num = pd2.pushcase_num+1)
)
SELECT order_id,
cuntomer_id,
pushcase_num,
avg(value) AS avg_val,
avg(date_diff) AS avg_date_diff
FROM prep_data2
GROUP BY pushcase_num

How to use SQL to get column count for a previous date?

I have the following table,
id status price date
2 complete 10 2020-01-01 10:10:10
2 complete 20 2020-02-02 10:10:10
2 complete 10 2020-03-03 10:10:10
3 complete 10 2020-04-04 10:10:10
4 complete 10 2020-05-05 10:10:10
Required output,
id status_count price ratio
2 0 0 0
2 1 10 0
2 2 30 0.33
I am looking to add the price for previous row. Row 1 is 0 because it has no previous row value.
Find ratio ie 10/30=0.33
You can use analytical function ROW_NUMBER and SUM as follows:
SELECT
id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id ORDER BY date), 0) - price as price
FROM yourTable;
DB<>Fiddle demo
I think you want something like this:
SELECT
id,
COUNT(*) OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
COALESCE(SUM(price) OVER (PARTITION BY id
ORDER BY date ROWS BETWEEN
UNBOUNDED PRECEDING AND 1 PRECEDING), 0) price
FROM yourTable;
Demo
Please also check another method:
with cte
as(*,ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) - 1 AS status_count,
SUM(price) OVER (PARTITION BY id ORDER BY date) ss from yourTable)
select id,status_count,isnull(ss,0)-price price
from cte

BigQuery missing rows with SUM OVER PARTITION BY

TL;DR:
Given this table:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
How to do I get a table where the missing date/product combination (2020-11-02 - premium) is included with a fallback value for diff of 0.
Ideally, for multiple products. A list of all products can be get like this:
SELECT ARRAY_AGG(DISTINCT product) FROM subscriptions
I want to be able to get the subscription count per day, either for all products or just for some products.
And the way I think this can be easily achieved is by preparing a database that looks like this:
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 50 |
|---------------------|------------------|------------------|
With this table, I can easily group by date and product or just by date and sum the total.
Before I get to the result table I have generated a table where for each day and product I calculate the difference in subscriptions. How many new subscribers for each product are there and how many are no longer subscribed.
This table looks like this:
|---------------------|------------------|------------------|
| date | product | diff |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | -20 |
|---------------------|------------------|------------------|
Meaning on November, 1st the total count of premium subscribers increased by 50, and the total count of basic subscribers decreased by 20.
The problem now is that this temporary table is missing date points if there weren't any changes one product, see the example below.
When I started there was no product table and I only had the date and diff column.
To get from the second to the first table I used this query which worked perfect:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, 150 as diff
UNION ALL SELECT TIMESTAMP("2020-11-02"), -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), 60
)
SELECT
*,
SUM(diff) OVER (ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
But when I add the product column and try to calculate the sum per day and product there are some data points missing.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
)
SELECT
*,
SUM(diff) OVER (PARTITION BY product ORDER BY date) as total_subscriptions
FROM subscriptions
ORDER BY date
--
|---------------------|------------------|------------------|
| date | product | total |
|---------------------|------------------|------------------|
| 2020-11-01 | basic | 100 |
|---------------------|------------------|------------------|
| 2020-11-01 | premium | 50 |
|---------------------|------------------|------------------|
| 2020-11-02 | basic | 90 |
|---------------------|------------------|------------------|
| 2020-11-03 | basic | 130 |
|---------------------|------------------|------------------|
| 2020-11-03 | premium | 70 |
|---------------------|------------------|------------------|
If I now show the total number of subscriptions per day, I would get:
150 -> 90 -> 200
But I would expect:
150 -> 140 -> 200
Same goes for the total number of premium subscriptions per day:
50 -> 0 -> 70
But I would expect:
50 -> 50 -> 70
I believe the best option to fix this would be to add the missing date/product combinations.
How would I do this?
-- Try this,I am creating a table for list of products and add total product in that list. Joining with your table to get data as per your requirement.
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
product_name as (
Select product from subscriptions group by 1
union all
Select "Total" as product
)
Select date
,product
,total_subscriptions
from (
Select a.date
,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY a.date) as total_subscriptions
from
(
Select date,a.product
from product_name A
join subscriptions B
on 1=1
where a.product !='Total'
group by 1,2
) A
left join subscriptions B
on A.product = B.product
and A.date = B.date
group by 1,2,3
) group by 1,2,3
union all
Select date
,product
,total_subscriptions
from
(
Select date,a.product
,diff
,SUM(diff) OVER (PARTITION BY a.product ORDER BY date) as total_subscriptions
from product_name A
join subscriptions B
on 1=1
where a.product ='Total'
group by 1,2,3
) group by 1,2,3
order by 1,2
If I follow you correctly, one approach is to can generate a fixed the list of dates for the period you want, and cross join it with the list of products. This gives you all possible combinations. Then, you can bring the subscriptions table with a left join, and finally perform the window sum:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from unnest(generate_timestamp_array(
timestamp('2020-11-01'),
timestamp('2020-11-03'),
interval 1 day)
) dt
cross join (
select 'basic' product
union all select 'premium'
) p
left join subscriptions on s.product = p.product and s.date = dt
We can make the query a more generic by dynamically generating the date range and list of products:
select d.dt, p.product, sum(s.diff) over(partition by p.product order by d.dt) total
from (select min(date) min_dt, max(date) max_dt from subscriptions) d0
cross join unnest(generate_timestamp_array(d0.min_dt, d0.max_dt, interval 1 day)) dt
cross join (select distinct product from subscriptions) p
left join subscriptions on s.product = p.product and s.date = dt
Use GENERATE_TIMESTAMP_ARRAY:
WITH subscriptions AS (SELECT TIMESTAMP("2020-11-01") as date, "premium" as product, 50 as diff
UNION ALL SELECT TIMESTAMP("2020-11-01"), "basic", 100
UNION ALL SELECT TIMESTAMP("2020-11-02"), "basic", -10
UNION ALL SELECT TIMESTAMP("2020-11-03"), "premium", 20
UNION ALL SELECT TIMESTAMP("2020-11-03"), "basic", 40
),
dates AS (
SELECT *
FROM UNNEST(GENERATE_TIMESTAMP_ARRAY('2020-11-01 00:00:00', '2020-11-03 00:00:00', INTERVAL 1 DAY)) as date
),
products AS (
SELECT DISTINCT product FROM subscriptions
)
SELECT dates.date, products.product, subscriptions.diff
FROM dates
CROSS JOIN products
LEFT JOIN subscriptions
ON subscriptions.date = dates.date AND subscriptions.product = products.product

How to do a Min and Max of date but following the changes in price points

I'm not really sure how to word this question better so I'll provide the data that I have and the result that I'm after.
This is the data that I have
sku sales qty date
A 100 1 1-Jan-19
A 200 2 2-Jan-19
A 100 1 3-Jan-19
A 240 2 4-Jan-19
A 360 3 5-Jan-19
A 360 4 6-Jan-19
A 200 2 7-Jan-19
A 90 1 8-Jan-19
B 100 1 9-Jan-19
B 200 2 10-Jan-19
And this is the result that I'm after
sku price sum(qty) sum(sales) min(date) max(date)
A 100 4 400 1-Jan-19 3-Jan-19
A 120 5 600 4-Jan-19 5-Jan-19
A 90 4 360 6-Jan-19 6-Jan-19
A 100 2 200 7-Jan-19 7-Jan-19
A 90 1 90 8-Jan-19 8-Jan-19
B 100 3 300 9-Jan-19 10-Jan-19
As you can see, I'm trying to get the min and max date of each price point, where price = sales/qty. At this point, I can get the min and max date of the same price but I can separate it when there's another price in between. I think I have to use some sort of min(date) over (partition by sales/qty order by date) but I can't figure it out yet.
I'm using Redshift SQL
This is a gaps-and-islands query. You can do this by generating a sequence and subtracting that from the date. Then aggregate:
select sku, price, sum(qty), sum(sales),
min(date), max(date)
from (select t.*,
row_number() over (partition by sku, price order by date) as seqnum
from t
) t
group by sku, price, (date - seqnum * interval '1 day')
order by sku, price, min(date);
You can do with Sub Query and LAG
FIDDLE DEMO
SELECT SKU, Price, SUM(Qty) SumQty, SUM(Sales) SumSales, MIN(date) MinDate, MAX(date) MaxDate
FROM (
SELECT SKU,Price,SUM(is_change) OVER(order by SKU, date) is_change,Sales, Qty,date
FROM (SELECT SKU, Sales/Qty AS Price, Sales, Qty,date,
CASE WHEN Sales/Qty = lag(Sales/Qty) over (order by SKU, date)
and SKU = lag(SKU) OVER (order by SKU, date) then 0 ELSE 1 END AS is_change
FROM Tbl
)InnerSelect
) X GROUP BY sku, price,is_change
ORDER BY SKU,MIN(date)
Output

Query to show all items for every months

I have two tables: shipped (item, qty_shipd, date_shpd) and forecast (item, qty_forecat, date_forecast) and i need something below
Item Forecast Shipped Forecast_date Shipped_date
item1 50 100 2018-01-01 2018-01-15
item2 0 50 - 2018-01-06
item3 100 100 2018-02-01 2018-02-05
item4 150 0 2018-02-01 -
item1 0 20 - 2018-03-15
item1 10 50 2018-04-01 2018-04-28
Is it possibile have something like this table?
Thanks so much
Hmmm. This seems rather complicated. You seem to want to combine the rows from both tables within a month, not losing any values from either one.
If so, I think this does what you want:
select item, max(shipped) as shipped, max(shipped_date) as shipped_date,
max(forecast) as forecast, max(forecast_date) as forecast_date
from ((select Item, Shipped, Shipped_date, null as forecast, null as forecast_date,
row_number() over (partition by item, year(shipped_date), month(shipped_date) order by shipped_date) as seqnum
from shipped
) union all
(select Item, NULL as Shipped, NULL as Shipped_date, null as forecast, null as forecast_date,
row_number() over (partition by item, year(shipped_date), month(shipped_date) order by shipped_date) as seqnum
from shipped
)
) sf
group by item, year(coalesce(shipped_date, forecast_date)),
month(coalesce(shipped_date, forecast_date)), seqnum
I suggest you use Calendar Table (temporary or simply permanent table).
Join the calendar table date with each shipped and forecast date using outer join. Don't forget to join forecast item with shipped item. Also handle if the data from shipped and forecast is NULL
SELECT
ISNULL(f.item, s.item) AS Item,
c.Date,
ISNULL(qty_forecat,0) AS Forecast,
ISNULL(qty_shipd,0) AS Shipped,
date_forecast AS Forecast_date,
date_shpd AS Shipped_date
FROM
Calendar AS c
LEFT OUTER JOIN forecast AS f
ON c.Date=f.date_forecast
LEFT OUTER JOIN shipped AS s
ON c.Date=s.date_shpd
AND f.item=s.item