SQL Sumif statement request - sql

I would like to create a column that will get the total hours based on the store column and the hours column. See below table. So it will total up rep1,2,3 from store 142 and total rep 1,2 from store 356. Then I would also like to devide hours into total to get a contribution% column
Date store rep hours total cont%
--------------------------------------------------
x 142 rep1 5 11 0.45
x 142 rep2 2 11 0.18
x 142 rep3 4 11 0.36
x 356 rep1 4 7 0.57
x 356 rep2 3 7 0.42
Thank you!

You want window functions:
select t.*, sum(hours) over (partition by store) as total,
t.hours * 1.0 / sum(hours) over (partition by store) as cont_percent
from t;

Related

Postgresql calculate percentage of values out of sum of values of specific rows

I need to calculate percentage of hours per each project, not out of all the quantity of projects.
Here is the initial table:
employee_id
project_id
hours
999111111
1
31.4
999111111
2
8.5
999333333
3
42.1
999888888
1
21.0
999888888
2
22.0
999444444
2
12.2
999444444
3
10.5
999444444
1
null
999444444
10
10.1
999444444
20
11.8
999887777
30
30.8
999887777
10
10.2
999222222
10
34.5
999222222
30
5.1
999555555
30
19.2
999555555
20
14.8
999666666
20
null
Needed output:
employee_id
project_id
percent
999111111
1
60
999111111
2
20
999333333
3
80
999888888
1
40
999888888
2
52
999444444
2
29
999444444
3
20
999444444
1
null
999444444
10
18
999444444
20
44
999887777
30
56
999887777
10
19
999222222
10
63
999222222
30
9
999555555
30
35
999555555
20
56
999666666
20
null
I understand how to calculate out of overall COUNT of all hours, but I need percentage per employee out of COUNT of hours within the same project ID, and that's what I'm struggling with. How can it be done?
Assuming every project would have at least one record with a non zero hours value, we can try using this query:
SELECT employee_id, project_id,
100.0 * hours / SUM(hours) OVER (PARTITION BY project_id) AS percent
FROM yourTable
ORDER BY project_id, employee_id;
You can use window functions to calculate the percentage of hours for each project:
SELECT employee_id, project_id,
100.0 * SUM(hours) OVER (PARTITION BY project_id) / SUM(hours) OVER () AS percent
FROM yourTable
ORDER BY employee_id, project_id;

Cumulated Cohorts in SQL

I have the following table :
cohort
month cohort
orders
cumulated orders
2021-01
0
126
126
2021-01
1
5
131
2021-01
2
4
135
2021-02
0
131
131
2021-02
1
9
140
2021-02
2
8
148
And now I want to have the following table where I divide each repeat orders by the number of orders of month 0 :
cohort
month cohort
orders
cumulated orders
cumulated in %
2021-01
0
126
126
100%
2021-01
1
5
131
104%
2021-01
2
4
135
107%
2021-02
0
131
131
100%
2021-02
1
9
140
107%
2021-02
2
8
148
114%
My only hint is to create a CASE statement, but I don't want each month to update the query by adding the line
WHEN cohort="2021-08" THEN cumulated orders / 143
where 143 is the number of orders of cohort 2021-08 at month cohort =0
Has someone got an idea how to get this table ?
A case expression isn't needed. You can use first_value():
select t.*,
( cumulated_order /
first_value(orders) over (partition by cohort order by month_cohort)
) as ratio
from t;
If you really wanted a case, you could use:
select t.*,
( cumulated_order /
max(case when month_cohort = 0 then orders end) over (partition by cohort)
) as ratio
from t;
Consider below
select *,
round(100 * cumulated_orders /
sum(if(month_cohort = 0, orders, 0)) over(partition by cohort)
) as cumulated_in_percent
from `project.dataset.table`
if applied to sample data in your question - output is

How to calculate the median by fixing some variables?

I have a data set that I already aggregated. This basically shows the median prices for each cat, root_cat, and cluster on daily basis.
date cluster root_cat cat median_price
2020-12-07 A X 1 20
2020-12-07 A X 2 15
2020-12-07 A X 2 30
2020-12-08 B Y 3 24
Here is the query that I wrote for calculating the median price.
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median
FROM
(SELECT pl.*,
NTILE(2)
OVER (PARTITION BY product_id
ORDER BY min_price) AS tile2
FROM pl
WHERE cluster is NOT null
AND (date_parse(date, '%Y-%m-%d') >= current_date - interval '15' day) ) d
GROUP BY 1, 2, 3, 4
Now, I would like to have one more column that shows the median price for each cat and root_cat last 14 days except the latest day. How can I do this?
Here is the desired output:
date cluster root_cat cat median_price median_price_root median_price_cat
2020-12-07 A X 1 20 20 20
2020-12-07 A X 2 15 20 22,5
2020-12-07 A X 2 30 20 22,5
2020-12-08 B Y 3 24 24 24
If an approximation of the median is good enough, then you can use
SELECT date,
page_impressions_cluster,
root_cat,
cat,
MAX(CASE
WHEN tile2 = 1 THEN
min_price/100 END) AS median,
approx_percentile(price, 0.5) -- <<== the 0.5 percentile is the median
FROM ...
See the doc for the approc_percentile function here.

SQL Server : Segregate data into dynamic buckets

Please help me with a SQL Server Query that can bucket data dynamically into ranges.
Here is my source data:
Value
=======
45
33.5
33.1
33
32.8
25.3
25.2
25.1
25
21.3
21.2
21.1
20.9
12.3
12.2
12.15
12.1
12
11.8
Expected output:
Value Rank
=============
45 1
(mean value in this range is 45)
33.5 2
33.1 2
33 2
32.8 2
(mean value is 33.1 - any value in the range (-10%) 29.79 to 36.41 (+10%) should be given a rank of 2)
25.3 3
25.2 3
25.1 3
25 3
21.3 4
21.2 4
21.1 4
20.9 4
12.3 5
12.2 5
12.15 5
12.1 5
12 5
11.8 5
DENSE, RANK and NTILE does not seem to give me a ranking like this. The range is dynamic and not known earlier. Any help highly appreciated.
The bucketing rule is:
Each bucket contains a data set with 10% variation from the mean value
Here's one way:
select val, dense_rank() over (order by cast(val/10 as int) desc) ntile
from yourtable
Use dense_rank but specify your buckets in the order by clause. (I'm assuming this is how it works for your sample data)
First convert the value to a number having 2 decimal places.
Then, use a CASE expression for doing FLOOR or ROUND function based on the first number after decimal point.
Then use DENSE_RANK function for giving rank based on the rounded value.
Query
select z.[Value], dense_rank() over(order by z.[val_rounded] desc) as [Rank] from(
select t.[Value],
case when substring(t.[Value2], charindex('.', t.[Value2], 1) + 1, 1) > 5
then round(t.[Value], 0) else floor(t.[Value]) end as [val_rounded] from(
select [Value], cast((cast([Value]as decimal(6, 2))) as varchar(50)) as [Value2]
from [your_table_name]
)t
)z;
Demo

Calculate Sub Query Column Based On Calculated Column

I have a table ScheduleRotationDetail that contains these as columns:
ScheduleRotationID ScheduleID Ordinal Duration
379 61 1 1
379 379 2 20
379 512 3 1
379 89 4 20
I have a query that goes like this in order to get the day of the year each schedule is supposed to start on:
SELECT ScheduleID, Ordinal, Duration,
,Duration * 7 AS DurationDays
,( SELECT ( ISNULL( SUM(ISNULL( Duration, 0 )), 0 ) - 1 ) * 7
FROM ScheduleRotationDetail WHERE ScheduleRotationID = srd.ScheduleRotationID
AND Ordinal <= srd.Ordinal ) AS StartDay
FROM ScheduleRotationDetail srd
WHERE srd.ScheduleRotationID = 379
That outputs this as the result set:
ScheduleID Ordinal Duration DurationDays StartDay
61 1 1 7 0
379 2 20 140 140
512 3 1 7 147
89 4 20 140 287
Yet what I need the start day column values to be are:
0
7
147
154
I have tried CTEs but can't get it to work so I've come to here for advice.
It looks like you want a cumulative sum. In SQL Server 2012+, you can do:
SELECT ScheduleID, Ordinal, Duration,
SUM(Duration*7) OVER (ORDER BY Ordinal) - Duration*7 as StartDate
FROM ScheduleRotationDetail srd ;
In earlier versions, you can use APPLY for this purpose (or a correlated subquery).