Dividing a fixed number over multiple rows by weight - sql

So I have a table that has this structure:
Id - Team - User - Weight
For example: In a team we have the goal to reach 500 calls, so I'd have to spread the 500 calls by weight over each user in the team so I get this structure returned:
Id - Team - User - Weight - # Calls to be made
I know I can do this with a OVER(partition by) and that works perfectly, except for one little detail, there's no way to make half a call.
I'd need this distribution to be without commas.
The basic query:
select Id,Team,User,Weight from CallList
4 ED PEDRO 1
5 RE PEDRO 1
6 PO ROOEIC 0,5
7 PO ROOEIC01 0,5
1 AP APSYSL 0,333333333333333
2 AP APSYSL01 0,333333333333333
3 AP APSYSL02 0,333333333333333
And this is what I would want returned
4 ED PEDRO 1 500
5 RE PEDRO 1 500
6 PO ROOEIC 0,5 250
7 PO ROOEIC01 0,5 250
1 AP APSYSL 0,333333333333333 167
2 AP APSYSL01 0,333333333333333 167
3 AP APSYSL02 0,333333333333333 166

This is an arithmetic problem. The idea is to get a "base" value for each id in a team. Then calculate the excess and incrementally add the excess.
select t.*,
(start_value +
(case when row_number() over (partition by team order by id) <= excess
then 1 else 0
end)
) as calls
from (select t.*,
floor(weight * 500) as start_value,
500 - sum(floor(weight * 500)) over (partition by team) as excess
from t
) t;
This adds the excess from the first row. You seem to want it from the last row. You can achieve that using order by id desc in the row_number().

Simplifying Gordon's logic to remove the Derived Table:
SELECT *
,floor(500 * weight) -- always <= 500
+ CASE -- check if there's a remainder n, if yes split it across n rows
WHEN COUNT(*) -- get a (kind of) random row, using row_number/order you might define a specific row
OVER (PARTITION BY Team
ROWS UNBOUNDED PRECEDING)
<= (500 - SUM(floor(500*weight)) -- remainder calculation
OVER (PARTITION BY Team))
THEN 1
ELSE 0
END
FROM CallList

Related

Number of rows per "percentile"

I would like a Postgres query returning the number of rows per percentile.
Input:
id
name
price
1
apple
12
2
banana
6
3
orange
18
4
pineapple
26
4
lemon
30
Desired output:
percentile_3_1
percentile_3_2
percentile_3_3
1
2
2
percentile_3_1 = number of fruits in the 1st 3-precentile (i.e. with a price < 10)
Postgres has the window function ntile() and a number of very useful ordered-set aggregate functions for percentiles. But you seem to have the wrong term.
number of fruits in the 1st 3-precentile (i.e. with a price < 10)
That's not a "percentile". That's the count of rows with a price below a third of the maximum.
Assuming price is defined numeric NOT NULL CHECK (price > 0), here is a generalized query to get row counts for any given number of partitions:
WITH bounds AS (
SELECT *
FROM (
SELECT bound AS lo, lead(bound) OVER (ORDER BY bound) AS hi
FROM (
SELECT generate_series(0, x, x/3) AS bound -- number of partitions here!
FROM (SELECT max(price) AS x FROM tbl) x
) sub1
) sub2
WHERE hi IS NOT NULL
)
SELECT b.hi, count(t.price)
FROM bounds b
LEFT JOIN tbl t ON t.price > b.lo AND t.price <= b.hi
GROUP BY 1
ORDER BY 1;
Result:
hi | count
--------------------+------
10.0000000000000000 | 1
20.0000000000000000 | 2
30.0000000000000000 | 2
Notably, each partition includes the upper bound, as this makes more sense while deriving partitions from the maximum value. So your quote would read:
i.e. with a price <= 10
db<>fiddle here

How to get the values for every group of the top 3 types

I've got this table ratings:
id
user_id
type
value
0
0
Rest
4
1
0
Bar
3
2
0
Cine
2
3
0
Cafe
1
4
1
Rest
4
5
1
Bar
3
6
1
Cine
2
7
1
Cafe
5
8
2
Rest
4
9
2
Bar
3
10
3
Cine
2
11
3
Cafe
5
I want to have a table with a row for every pair (user_id, type) for the top 3 rated types through all users (ranked by sum(value) across the whole table).
Desired result:
user_id
type
value
0
Rest
4
0
Cafe
1
0
Bar
3
1
Rest
4
1
Cafe
5
1
Bar
3
2
Rest
4
3
Cafe
5
2
Bar
3
I was able to do this with two queries, one to get the top 3 and then another to get the rows where the type matches the top 3 types.
Does someone know how to fit this into a single query?
Get rows per user for the 3 highest ranking types, where types are ranked by the total sum of their value across the whole table.
So it's not exactly about the top 3 types per user, but about the top 3 types overall. Not all users will have rows for the top 3 types, even if there would be 3 or more types for the user.
Strategy:
Aggregate to get summed values per type (type_rnk).
Take only the top 3. (Break ties ...)
Join back to main table, eliminating any other types.
Order result by user_id, type_rnk DESC
SELECT r.user_id, r.type, r.value
FROM ratings r
JOIN (
SELECT type, sum(value) AS type_rnk
FROM ratings
GROUP BY 1
ORDER BY type_rnk DESC, type -- tiebreaker
LIMIT 3 -- strictly the top 3
) v USING (type)
ORDER BY user_id, type_rnk DESC;
db<>fiddle here
Since multiple types can have the same ranking, I added type to the sort order to break ties alphabetically by their name (as you did not specify otherwise).
Turns out, we don't need window functions - the ones with OVER and, optionally, PARTITION for this. (Since you asked in a comment).
I think you just want row_number(). Based on your results, you seem to want three rows per type, with the highest value:
select t.*
from (select t.*,
row_number() over (partition by type order by value desc) as seqnum
from t
) t
where seqnum <= 3;
Your description suggests that you might just want this per user, which is a slight tweak:
select t.*
from (select t.*,
row_number() over (partition by user order by value desc) as seqnum
from t
) t
where seqnum <= 3;

Calculate moving average with null values

I have a school graduation data set by year and subgroup and have been provided the numerator and denominator and the single year graduation rate but I also need to calculate a 3 year moving average. I was advised by a statistician that no longer works with us that to do this I needed to get the running total for the numerator for 3 years and the running total for 3 years for the denominator. I understand the math behind it and have checked my work by hand and via excel with a few subgroups. I have also calculated this using T-SQL with no problem so long as there are no null records but I’m struggling with the calculation when there are nulls or 0.
I have tried running the query accounting for null by using NULLIF
ID,
Bldg,
GradClass,
Sbgrp ,
TGrads,
TStus,
Rate,
/*Numerator Running total*/
SUM (TGrads) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS NumSum,
/*Denominator Running Total*/
SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) AS DenSum,
/*Moving Year Average*/
(
( SUM ( TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) / NULLIF ( ( SUM ( TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ), 0 ) * 100
) AS 3yrAvg
FROM
KResults.DGSRGradBldg
First question, I was provided a record for all subgroups even if they didn’t have students in the subgroup. I want to keep the record so that all subgroups are accounted for within the district and since I know that they didn’t have data, can I substitute the Null values in Tgrads, TStus with a 0? If I do substitute those values with a 0 how can I show the rate as null?
Second question how can I compute the rate with either a null or 0 denominator? I understand you can’t divide by 0 but I want to maintain the record so it’s easy and clear to see that they had no data. How can I do this? When I try to calculate this without accounting for Null I get errors, 1.)Divide by zero error encountered. (8134) and 2.) Null value is eliminated by an aggregate or other SET operation. (8153).
Knowing I can’t divide by 0 or Null I modified my query to include NULLIF and when I do that the query runs with no errors but I don’t get accurate percentage for rates that are below 100%. All my rates are now either 100% or 0 - note the last row, the moving average of 2/3 is not 0.
Here’s what the data looks like if I try to account for nulls my Moving three year average shows as 0. Note the Moving three year Avg Column shows all 0.
ID Bldg Class Sbggrp TGrads TStus Rate NumSum DenSum 3yrAvg
A 1 2014 A1 46 49 93.9 46 49 0
A 1 2015 A1 41 46 89.1 87 95 0
A 1 2016 A1 47 49 95.9 134 144 0
A 1 2017 A1 38 40 95.0 126 135 0
A 1 2018 A1 59 59 98.3 143 148 0
A 1 2014 A2 1 1 100 1 1 100
A 1 2015 A2 1 1 100
A 1 2016 A2 1 1 100
A 1 2017 A2 2 3 66.7 2 3 0
A 1 2018 A2 2 2 100 4 5 0
Any advice would be appreciated but please provide suggestions kindly to this newbie.
Thanks for your time and help.
Answer to question 1: put in the select condition
ISNULL(TGrads,0) AS TGRADS,
ISNULL(TStus,0) AS TSTUS,
Answer to question 2: I'd do this
(CASE WHEN SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) IS NOT NULL
AND SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) <>0
THEN (SUM(TGrads) OVER ( partition BY DistrictID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) / (SUM(TStus) OVER ( partition BY ID, Sbgrp ORDER BY GradClass ROWS BETWEEN 2 preceding AND CURRENT row ) ) ) * 100
ELSE NULL END
) AS 3yrAvg
I put null after "ELSE"...You can choose your default value.

SQL - Overall average Points

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?
You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);
Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge
After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID
You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.