Second largest value in a group - sql

I want to select second largest value in each group, how can I do that in SQL?
For example with the below table,
IDs value
ID1 2
ID1 3
ID1 4
ID2 1
ID2 2
ID2 5
When grouping by IDs, I want this output
IDs value
ID1 3
ID2 2
Thanks.

Use row_number():
select t.*
from (select t.*, row_number() over (partition by id order by value desc) as seqnum
from t
) t
where seqnum = 2;

Alternate way - you can use dense_rank().
It will make sure that your SQL always returns second largest value even when you have two records with largest value.
select t.*
from (select t.*, dense_rank() over (partition by id order by value desc) as rrank
from t
) t
where rrank = 2;

Related

How to get the count of the modal value in PostgreSQL?

I have calculated the modal value of a column in a JOIN.
mode() WITHIN GROUP (ORDER BY col) AS modal_col
I would also like the frequency of the modal value. i.e. how often does this value appear?
I have tried to simply nest this in the count function, but postgres does not allow this.
count(mode() WITHIN GROUP (ORDER BY col))
ERROR: aggregate function calls cannot be nested
I have also tried:
row_number() OVER (PARTITION BY id ORDER BY count(*)
I have also tried the RANK() function, but these simply give me the row numbers
I would like a simple count of the occurrence of modal value.
Input
id
col
id1
a
id1
a
id1
b
id2
a
id2
a
id3
a
id3
b
id3
c
id3
c
id3
c
id3
c
Output
id
col_mode
mode_count
id1
a
2
id2
a
3
id3
c
4
EDIT
SELECT DISTINCT ON (t1.id)
t1.id,
mode() WITHIN GROUP (ORDER BY t2.col) AS modal_col,
count(*) OVER(PARTITION BY t1.id, t2.col) AS mode_count
FROM schema.foo t2
JOIN schema.bar t1
ON t2.id2 = t1.id2
ORDER BY t1.id, count(*) OVER(PARTITION BY t1.id, t2.col) DESC
;
Thanks to Danny for the pointer.
I tried the above and postgres errors and requires that I group by both t1.id AND t2.col. Do I need to create an intermediary scratch table as I do not to want to group by both columns, just t1.id?
select distinct on (id)
id
,col as col_mode
,count(*) over(partition by id, col) as mode_count
from t
order by id, count(*) over(partition by id, col) desc
id
col_mode
mode_count
id1
a
2
id2
a
2
id3
c
4
Here's another answer using mode() within group
select id
,mode() WITHIN GROUP (order by col) as col_mode
,max(mode_count) as mode_count
from
(
select *
,count(*) over(partition by id, col) as mode_count
from t
) t
group by id
Fiddle

How to select the top 3 values from a group based on date and exclude duplicate value?

If I three columns and 1 column has ID, 1 column has value and 1 column has date. Example, ID column has ID1, ID2, ID3. The value for each ID has a numeric value, say 1,2,3,4,5 for each ID.
How do I only get 3 results for each ID based on the most recent date descending.
I am using Sybase SQL. Is there any way I can write this?
I tried to use Row_number() and rank() but I don't get to use either of those functions with my SQL tool.
ID value Date
1 3 20190511
1 1 20190503
1 5 20190401
2 2 20190520
2 1 20190514
2 4 20190503
3 1 20190516
3 5 20190415
3 3 20190402
If you don't have row_number try this
SELECT *
FROM yourTable t1
WHERE (SELECT COUNT(*)
FROM yourTable t2
WHERE t1.id = t2.id
AND t1.date < t2.date) < 3
So if one id have 3 or more older rows wont appear.
with row_number
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) as rn
FROM YourTable t1
) as t
WHERE t.rn <= 3
I assume you cant have multiple rows in same date. In that case you may want use RANK() or DENSE_RANK() and decide how handle ties.
One method uses a correlated subquery with in:
select t.*
from t
where t.date in (select top (3) t2.date
from t t2
where t2.id = t.id
order by t2.date desc
);
Note that this assumes that the dates are unique.

SQL script to identify row based on min value

How to write a SQL statement (in SQL Server) to get a row with minimum value based on two columns?
For example:
Type Rank Val1 val2
------------------------------
A 6 486.57 38847
B 6 430 56345
C 5 390 99120
D 5 329 12390
E 4 350 11109
E 4 320 11870
The SQL statement should return the last row in above table, because it has min value for Rank, and Val1.
Something like this:
select *
from Table1
where rank = (select min(rank) from Table1)
and Val1 = (select min(Val1)
from Table1
where rank = (select min(rank) from Table1))
Or this, if you like a simple life:
select top 1 *
from Table1
order by rank asc, Val1 asc
with cte as (
select *, row_number() over (order by rank, val1) as rn
from dbo.yourTable
)
select *
from cte
where rn = 1;
The idea here is that I'm assigning a 1..n enumeration to the rows based on rank and, in the case of ties, Val1. I return the row that takes the value of 1. If there is the possibility of a tie, use rank() instead of row_number().
I'm assuming that Type is the primary key for your table, and that you only want a row that has both the lowest Val1 and lowest Val2 (so if one row has the lowest Val1, but not the lowest Val2, this returns no data). I'm not sure about these assumptions, but your question could probably be clarified a bit.
Here's the code:
SELECT
*
FROM
Table1
WHERE
Type IN
(
SELECT
Type
FROM
Table1
GROUP BY
Type
HAVING
MIN(Val1) AND MIN(val2)
)

Get MAX ID from multiple records in table where ID2 is the same and where Value 3<>0

In the above screenshot I need to get the subplanid where MAX(ID) in that group of subplanid does not have a formularymixtype of 0
I think this does what you want:
select t.*
from (select t.*,
row_number() over (partition by subplanid order by id desc) as seqnum
from t
) t
where seqnum = 1 and formularymixtype <> 0;
This query will query out subplainid from table where will take last id and formularymixtype is not equal to 0
SELECT subplainid FROM table t
where id = (select max(id) from table where formularymixtype <> 0 )

get intervals of nonchanging value from a sequence of numbers

I need to sumarize a sequence of values into intervals of nonchanging values - begin, end and value for each such interval. I can easily do it in plsql but would like a pure sql solution for both performance and educational reasons. I have been trying for some time to solve it with analytical functions, but can't figure how to properly define windowing clause. The problem I am having is with a repeated value.
Simplified example -
given input:
id value
1 1
2 1
3 2
4 2
5 1
I'd like to get output
from to val
1 2 1
3 4 2
5 5 1
You want to identify groups of adjacent values. One method is to use lag() to find the beginning of the sequence, then a cumulative sum to identify the groups.
Another method is the difference of row number:
select value, min(id) as from_id, max(id) as to_id
from (select t.*,
(row_number() over (order by id) -
row_number() over (partition by val order by id
) as grp
from table t
) t
group by grp, value;
Using a CTE to collect all the rows and identifying them into changing values, then finally grouping together for the changing values.
CREATE TABLE #temp (
ID INT NOT NULL IDENTITY(1,1),
[Value] INT NOT NULL
)
GO
INSERT INTO #temp ([Value])
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 2 UNION ALL
SELECT 1;
WITH Marked AS (
SELECT
*,
grp = ROW_NUMBER() OVER (ORDER BY ID)
- ROW_NUMBER() OVER (PARTITION BY Value ORDER BY ID)
FROM #temp
)
SELECT MIN(ID) AS [From], MAX(ID) AS [To], [VALUE]
FROM Marked
GROUP BY grp, Value
ORDER BY MIN(ID)
DROP TABLE #temp;