How to calculate the median by fixing some variables?

How to calculate the median by fixing some variables? - sql

I have a data set that I already aggregated. This basically shows the median prices for each cat, root_cat, and cluster on daily basis.
date cluster root_cat cat median_price
2020-12-07 A X 1 20
2020-12-07 A X 2 15
2020-12-07 A X 2 30
2020-12-08 B Y 3 24
Here is the query that I wrote for calculating the median price.
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median
FROM
(SELECT pl.*,
NTILE(2)
OVER (PARTITION BY product_id
ORDER BY min_price) AS tile2
FROM pl
WHERE cluster is NOT null
AND (date_parse(date, '%Y-%m-%d') >= current_date - interval '15' day) ) d
GROUP BY 1, 2, 3, 4
Now, I would like to have one more column that shows the median price for each cat and root_cat last 14 days except the latest day. How can I do this?
Here is the desired output:
date cluster root_cat cat median_price median_price_root median_price_cat
2020-12-07 A X 1 20 20 20
2020-12-07 A X 2 15 20 22,5
2020-12-07 A X 2 30 20 22,5
2020-12-08 B Y 3 24 24 24

If an approximation of the median is good enough, then you can use
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median,
approx_percentile(price, 0.5) -- <<== the 0.5 percentile is the median
FROM ...
See the doc for the approc_percentile function here.

Related

Find max value over the next 7 days for each group

I have a SQL table:
id
date
value
1
01/01/2019
50
1
01/13/2019
24
1
01/19/2019
53
2
01/05/2019
50
2
01/11/2019
24
2
01/24/2019
53
I want to create a new column that computes that max value over the next 14 days grouped by id. If the difference between the date in the current row and the next is greater than 14, return None or Null.
The new table will be:
id
date
value
max_14
1
01/01/2019
50
50
1
01/13/2019
24
53
1
01/19/2019
53
None
2
01/05/2019
50
50
2
01/11/2019
24
53
2
01/24/2019
53
None

You can use a sub-query for this:
select t.*, (
select max(value)
from t as x
where x.id = t.id
and x.date >= t.date
and x.date < dateadd(day, 14, t.date)
)
from t

Cumulative sum in SQL using window function

QTY
STOCK
RNK
ID KEY
CUM SUM
40
35
1
1
35
20
35
2
1
0
15
35
3
1
0
58
35
4
1
0
18
35
5
1
0
40
35
1
2
35
20
35
2
2
0
15
35
3
2
0
CUM SUM should be MIN(QTY, STOCK-SUM(all rows in cumsum before the current row)) for every other row and for 1st row it should be MIN(QTY, STOCK-SUM(0))=> MIN(QTY,STOCK)
QTY
STOCK
RNK
ID KEY
CUM SUM
40
35
1
1
5
20
35
2
1
-10
15
35
3
1
-30
58
35
4
1
-7
18
35
5
1
-24
40
35
1
2
5
20
35
2
2
-10
15
35
3
2
-30
After, I tried I am getting the above output
SELECT sum(qty-stock) over (
partition by ID KEY
ORDER BY rnk
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as CUM SUM
FROM TABLE
Need to get correct cumsum value using a window function in the existing table

You may use a rolling SUM() here, using SUM() as an analytic function:
SELECT *, SUM(QTY - STOCK) OVER (PARTITION BY ID_KEY ORDER BY RNK) AS CUM_SUM
FROM yourTable
ORDER BY ID_KEY, RNK;

Sales amounts of the top n selling vendors by month with other fields in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
i tried this (source here). But The desired output could not be obtained.
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t

Consider below approach
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20

Aggregate value when condition and partition by

I have the below table and I need to aggregate
Id Month Days Hours Audit
1 201803 20 30 Yes
1 201803 20 15 Yes
1 201802 19 4 No
2 201803 20 5 Yes
Expected output:
Id Month Days Hours Audit Total
1 201803 20 2 Yes 100
1 201803 20 3 Yes 100
1 201802 10 4 No
2 201803 20 5 Yes 100
Summary:
Partition by ID & Month
Aggregate Days & Hours
My SQL: (my work)
SELECT (CASE
WHEN AUDIT IN ('YES')
THEN HOURS * DAYS
END) OVER (PARTITION BY ID ,c.month) AS TOTAL
FROM TABLEA

Use sum as the window function.
SELECT t.*,SUM(CASE WHEN AUDIT = 'YES' THEN HOURS * DAYS END)
OVER(PARTITION BY ID,month) AS TOTAL
FROM TABLEA t

SQL Sumif statement request

I would like to create a column that will get the total hours based on the store column and the hours column. See below table. So it will total up rep1,2,3 from store 142 and total rep 1,2 from store 356. Then I would also like to devide hours into total to get a contribution% column
Date store rep hours total cont%
--------------------------------------------------
x 142 rep1 5 11 0.45
x 142 rep2 2 11 0.18
x 142 rep3 4 11 0.36
x 356 rep1 4 7 0.57
x 356 rep2 3 7 0.42
Thank you!

You want window functions:
select t.*, sum(hours) over (partition by store) as total,
t.hours * 1.0 / sum(hours) over (partition by store) as cont_percent
from t;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to calculate the median by fixing some variables? - sql

Related

Find max value over the next 7 days for each group

Cumulative sum in SQL using window function

Sales amounts of the top n selling vendors by month with other fields in bigquery

Aggregate value when condition and partition by

SQL Sumif statement request

Categories

Resources