SQL - Overall average Points - sql

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?

You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);

Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge

After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID

You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

Related

Select column's occurence order without group by

I currently have two tables, users and coupons
id
first_name
1
Roberta
2
Oliver
3
Shayna
4
Fechin
id
discount
user_id
1
20%
1
2
40%
2
3
15%
3
4
30%
1
5
10%
1
6
70%
4
What I want to do is select from the coupons table until I've selected X users.
so If I chose X = 2 the resulting table would be
id
discount
user_id
1
20%
1
2
40%
2
4
30%
1
5
10%
1
I've tried using both dense_rank and row_number but they return the count of occurrences of each user_id not it's order.
SELECT id,
discount,
user_id,
dense_rank() OVER (PARTITION BY user_id)
FROM coupons
I'm guessing I need to do it in multiple subqueries (which is fine) where the first subquery would return something like
id
discount
user_id
order_of_occurence
1
20%
1
1
2
40%
2
2
3
15%
3
3
4
30%
1
1
5
10%
1
1
6
70%
4
4
which I can then use to filter by what I need.
PS: I'm using postgresql.
You've stated that you want to parameterize the query so that you can retrieve X users. I'm reading that as all coupons for the first X distinct user_ids in coupon id column order.
It appears your attempt was close. dense_rank() is the right idea. Since you want to look over the entire table you can't use partition by. And a sorting column is also required to determine the ranking.
with data as (
select *,
dense_rank() over (order by id) as dr
from coupons
)
select * from data where dr <= <X>;

How to get rank of a user from all users

I have table called summary_coins , By ranking of coins I am trying to get an user ranking
I have tried like below
SELECT
user_id,
sum(get_count),
rank() over (order by sum(get_count) asc) as rank
FROM summary_coins
WHERE user_id = 2
GROUP BY user_id
sample data , without user_id = 2 in where I am getting below list
user_id sum rank
44 2 1
13 4 2
57 4 2
47 4 2
11 5 5
2 5 5
My desire out put :
2 5 5
Here I am always getting ranking 1 for user ID 2 , But from list of user it should be rank 5.
You want to apply WHERE user_id = 2 late. RANK OVER is the last thing to happen in your query, but you want to apply the WHERE clause afterwards. In order to do this make your query a subquery you select from:
SELECT user_id, sum_count, rank
FROM
(
SELECT
user_id,
sum(get_count) AS sum_count,
rank() over (order by sum(get_count) asc) as rank
FROM summary_coins
GROUP BY user_id
) all_users
WHERE user_id = 2;

Use aggregation only on rows where count(ID) is greater than one

Hi I have the following table
Cash_table
ID Cash Rates
1 50 3
2 100 4
3 70 10
3 60 10
4 13 7
5 20 8
5 10 10
6 10 5
What I want as a result is to cumulate all the entries that have a Count(id)>1 like this:
ID New_Cash New_Rates
1 50 3
2 100 4
3 (70+60)/(10+10) 10+10
4 13 7
5 (20+10)/(8+10) 8+10
6 10 5
So I only want to change the rows where Count(id)>1 and leave the rest like it was.
For the rows with count(id)>1 I want to sum up the rates and take the sum of the cash and divide it by the sum of the rates. The Rates alone aren't a problem since I can sum them up and group by id and get the desired result.
The problem is with the cash column:
I am trying to do it with a case statement but it isn't working:
select id, sum(rates) as new_rates, case
when count(id)>1 then sum(cash)/nullif(sum(rates),0))
else cash
end as new_cash
from Cash_table
group by id
You only need group by id and aggregate:
select
id,
sum(cash) / (case count(*) when 1 then 1 else sum(rates) end) as new_cash,
sum(rates) as new_rates
from Cash_table
group by id
order by id
See the demo.
You can aggregate rate and cash columns by sum() function with grouping by id
select
id,
sum(cash)/decode( sum( nvl(rates,0) ), 0 ,1, sum( nvl(rates,0) )) as new_cash,
sum(rates) as new_rates
from cash_table
group by id
there's no nullif() function in Oracle, use nvl() instead
switch case part ( where decode() function is used ) against the
possibility of division by zero

Get Percentile for a user

I have a table such as this:
Id, ReportId, UserId
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 1 3
9 2 3
10 1 4
My table has thousands of records, above is just an example of the table structure simplified for purpose of understanding the problem.
I'm trying to figure out what at what percentile a user sits based on how many reports he has read.
I've been looking into PERCENTILE_CONT and PERCENTILE_DISC functions, but I fail to understand them properly. https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql
What confuses me most is that what it appears to me is that these functions are trying to find the 50th percentile, not percentile for a specific record.
Maybe I'm just not understanding this correctly. Is there a better way?
EDIT:
To clarify. I want to know at what percentile a specific user (in this case user with id 1) sits based on how many reports they have read. If they read the most reports they would be at a higher percentile, what is that percentile? Lets say there are 100 users exactly, then the person with most reports read would be 1st percentile.
Update #2
One of these should do it:
select
a.UserId,
a.reports_read,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY a.reports_read) OVER (partition by UserId) AS percentile_d,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY a.reports_read) OVER (partition by UserId) AS percentile_c,
PERCENT_RANK() OVER(ORDER BY a.reports_read ) percent_rank,
CUME_DIST() OVER(ORDER BY a.reports_read ) AS cumulative_distance
from
(select UserId, count(distinct(ReportId)) as reports_read
from #tmp
group by UserId
) a
It gives the following results:
UserId reports_read percentile_d percentile_c percent_rank cumulative_distance
4 1 1 1 0 0.25
3 2 2 2 0.33333 0.5
2 3 3 3 0.66667 0.75
1 6 6 6 1 1
I hope this helps.

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.