How to get MAX int but exclude specific int? - sql

I'd like to get the max integer from a specific column excluding a value that will always be the max if present. Data may look like the following:
score, empid
1 3
3 3
10 3
1 5
2 5
1 8
2 8
3 8
10 8
In the above, I'd like MAX score less than 10. MAX(score) doesn't work in this case since it will bring back 10. Results should look like this:
score, empid
3 3
2 5
3 8
Any suggestions?

Here's an alternative method:
SELECT
MAX(CASE WHEN score = 10 THEN NULL ELSE score END) AS [max_score],
empid
FROM
table
GROUP BY
empid
This may be preferable if you prefer to avoid the sub-select.

select max(score) , empid
from table
where score < (select max(score) from table )
group by empid

Following Ben English's answer, if you are excluding only 1 value, you can also use NULLIF for less typing.
SELECT MAX(NULLIF(score, 10)), ...
FROM ...
GROUP BY ...
With CASE WHEN it's possible to exclude a range of values. Here we exclude all scores more than 10:
SELECT MAX(CASE WHEN score > 10 THEN NULL ELSE score), ...
FROM ...
GROUP BY ...
And here is the IIF version for less typing:
SELECT MAX(IIF(score > 10, NULL, score)), ...
FROM ...
GROUP BY ...

Related

SQL compares the value of 2 columns and select the column with max value row-by-row

I have table something like:
GROUP
NAME
Value_1
Value_2
1
ABC
0
0
1
DEF
4
4
50
XYZ
6
6
50
QWE
6
7
100
XYZ
26
2
100
QWE
26
2
What I would like to do is to groupby group and select the name with highest value_1. If their value_1 are the same, compare and select the max with value_2. If they're still the same, select the first one.
The output will be something like:
GROUP
NAME
Value_1
Value_2
1
DEF
4
4
50
QWE
6
7
100
XYZ
26
2
The challenge for me here is I don't know how many categories in NAME so a simple case when is not working. Thanks for help
You can use window functions to solve the bulk of your problem:
select t.*
from (select t.*,
row_number() over (partition by group order by value1 desc, value2 desc) as seqnum
from t
) t
where seqnum = 1;
The one caveat is the condition:
If they're still the same, select the first one.
SQL tables represent unordered (multi-) sets. There is no "first" one unless a column specifies the ordering. The best you can do is choose an arbitrary value when all the other values are the same.
That said, you might have another column that has an ordering. If so, add that as a third key to the order by.

Get previous value from column A when column B is not null in Hive

I have a table tableA below
ID number Estimate Client
---- ------
1 3 8 A
1 NULL 10 Null
1 5 11 A
1 NULL 19 Null
2 NULL 20 Null
2 2 70 A
.......
I would like to select previous row of Estimate column when number column is not null. For instance, when number = 3, then pre_estimate = NULL, when number = 5, then pre_estimate = 10, and when number = 2, then pre_estimate = 20.
The query below does not seem to return the correct answer in Hive. What should be correct way to do it?
select lag(Estimate, 1) OVER (partition by ID) as prev_estimate
from tableA
where number is not null
Consider the table with following structure:
number - int
estimate - int
order_column - int
order_column is taken as a column on which you want to sort your table rows.
Data in table:
number estimate order_column
3 8 1
NULL 10 2
5 11 3
NULL 19 4
NULL 20 5
2 70 6
I used the following query and got the result you have mentioned.
SELECT * FROM (SELECT number, estimate, lag(estimate,1) over(order by order_column) as prev_estimate from tableA) tbl where tbl.number is not null;
As per my understanding, I didn't find the reason to partition by id, that's why I haven't considered ID in the table.
The reason you were getting wrong results is due to the reason that where clause in main query will select only the records with number as not null and then it computes lag function, but you need to consider all the rows when computing the lag function and then you should select rows with number as not null.

Use aggregation only on rows where count(ID) is greater than one

Hi I have the following table
Cash_table
ID Cash Rates
1 50 3
2 100 4
3 70 10
3 60 10
4 13 7
5 20 8
5 10 10
6 10 5
What I want as a result is to cumulate all the entries that have a Count(id)>1 like this:
ID New_Cash New_Rates
1 50 3
2 100 4
3 (70+60)/(10+10) 10+10
4 13 7
5 (20+10)/(8+10) 8+10
6 10 5
So I only want to change the rows where Count(id)>1 and leave the rest like it was.
For the rows with count(id)>1 I want to sum up the rates and take the sum of the cash and divide it by the sum of the rates. The Rates alone aren't a problem since I can sum them up and group by id and get the desired result.
The problem is with the cash column:
I am trying to do it with a case statement but it isn't working:
select id, sum(rates) as new_rates, case
when count(id)>1 then sum(cash)/nullif(sum(rates),0))
else cash
end as new_cash
from Cash_table
group by id
You only need group by id and aggregate:
select
id,
sum(cash) / (case count(*) when 1 then 1 else sum(rates) end) as new_cash,
sum(rates) as new_rates
from Cash_table
group by id
order by id
See the demo.
You can aggregate rate and cash columns by sum() function with grouping by id
select
id,
sum(cash)/decode( sum( nvl(rates,0) ), 0 ,1, sum( nvl(rates,0) )) as new_cash,
sum(rates) as new_rates
from cash_table
group by id
there's no nullif() function in Oracle, use nvl() instead
switch case part ( where decode() function is used ) against the
possibility of division by zero

SQL - Overall average Points

I have a table like this:
[challenge_log]
User_id | challenge | Try | Points
==============================================
1 1 1 5
1 1 2 8
1 1 3 10
1 2 1 5
1 2 2 8
2 1 1 5
2 2 1 8
2 2 2 10
I want the overall average points. To do so, i believe i need 3 steps:
Step 1 - Get the MAX value (of points) of each user in each challenge:
User_id | challenge | Points
===================================
1 1 10
1 2 8
2 1 5
2 2 10
Step 2 - SUM all the MAX values of one user
User_id | Points
===================
1 18
2 15
Step 3 - The average
AVG = SUM (Points from step 2) / number of users = 16.5
Can you help me find a query for this?
You can get the overall average by dividing the total number of points by the number of distinct users. However, you need the maximum per challenge, so the sum is a bit more complicated. One way is with a subquery:
select sum(Points) / count(distinct userid)
from (select userid, challenge, max(Points) as Points
from challenge_log
group by userid, challenge
) cl;
You can also do this with one level of aggregation, by finding the maximum in the where clause:
select sum(Points) / count(distinct userid)
from challenge_log cl
where not exists (select 1
from challenge_log cl2
where cl2.userid = cl.userid and
cl2.challenge = cl.challenge and
cl2.points > cl.points
);
Try these on for size.
Overall Mean
select avg( Points ) as mean_score
from challenge_log
Per-Challenge Mean
select challenge ,
avg( Points ) as mean_score
from challenge_log
group by challenge
If you want to compute the mean of each users highest score per challenge, you're not exactly raising the level of complexity very much:
Overall Mean
select avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
Per-Challenge Mean
select challenge ,
avg( high_score )
from ( select user_id ,
challenge ,
max( Points ) as high_score
from challenge_log
) t
group by challenge
After step 1 do
SELECT USER_ID, AVG(POINTS)
FROM STEP1
GROUP BY USER_ID
You can combine step 1 and 2 into a single query/subquery as follows:
Select BestShot.[User_ID], AVG(cast (BestShot.MostPoints as money))
from (select tLog.Challenge, tLog.[User_ID], MostPoints = max(tLog.points)
from dbo.tmp_Challenge_Log tLog
Group by tLog.User_ID, tLog.Challenge
) BestShot
Group by BestShot.User_ID
The subquery determines the most points for each user/challenge combo, and the outer query takes these max values and uses the AVG function to return the average value of them. The last Group By tells SQL to average all the values across each User_ID.

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.