How to find the highest and second highest entry in SQL in a single query using `GROUP BY`? - sql

Let this be the table that is provided.
PID
TID
Type
Freq
1
1
A
3
1
1
A
2
1
1
A
1
1
1
B
3
1
2
A
4
1
2
B
5
I want to write a query to get an output like this.
PID
TID
Type
Max_Freq_1
Max_Freq_2
1
1
A
3
2
1
1
B
3
NULL
1
2
A
4
NULL
1
2
B
5
NULL
That is, given a combination of PID, TID, Type, what is the highest and second-highest frequency? If there aren't a sufficient number of entries in the table, then put second highest as NULL

If your database can use the window functions, then the top 2 Freq can be calculated via the DENSE_RANK function.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT PID, TID, Type, Freq
, DENSE_RANK() OVER (PARTITION BY PID, TID, Type ORDER BY Freq DESC) AS Rnk
FROM YourTable t
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
pid
tid
type
max_freq_1
max_freq_2
1
1
A
3
2
1
1
B
3
null
1
2
A
4
null
1
2
B
5
null
If ROW_NUMBER isn't available, then try this.
SELECT PID, TID, Type
, MAX(CASE WHEN Rnk = 1 THEN Freq END) AS Max_Freq_1
, MAX(CASE WHEN Rnk = 2 THEN Freq END) AS Max_Freq_2
FROM
(
SELECT t1.PID, t1.TID, t1.Type, t1.Freq
, COUNT(DISTINCT t2.Freq) AS Rnk
FROM YourTable t1
LEFT JOIN YourTable t2
ON t2.PID = t1.PID
AND t2.TID = t1.TID
AND t2.Type = t1.Type
AND t2.Freq >= t1.Freq
GROUP BY t1.PID, t1.TID, t1.Type, t1.Freq
) q
GROUP BY PID, TID, Type
ORDER BY PID, TID, Type
Demo on db<>fiddle here

This is what I came up with on PostgreSQL. Using the window function like row_number is the easiest way to get the result you want.
with t as (
select *, row_number() over (partition by pid, tid, "type" order by freq desc) as r
from test_so
) select pid, tid, "type", max(case when r = 1 then freq end) as "highest", max(case when r = 2 then freq end) as "second_highest"
from t
group by pid, tid, "type"

Related

Insert records from multiple rows of table to multiple columns of other table

I have an existing table structure with sample data like
Id
BookingId
Value
TypeId
AddedTime
1
100
10
T1
2021-03-22 08:51:52.6333333
2
100
20
T2
2021-03-22 08:50:55.8133333
3
100
30
T3
2021-03-22 08:50:22.1033333
4
200
50
T1
2021-03-22 08:50:22.1033333
5
200
60
T2
2021-03-22 08:50:22.1000000
6
200
70
T3
2021-03-22 08:50:22.0800000
and now data model is changed and it becomes like
Id
BookingId
Type1Value
Type2Value
Type3Value
AddedTime
Please help me what would be query to copy data from previous table to new table.
Output should be something like
Id
BookingId
Type1Value
Type2Value
Type3Value
AddedTime
1
100
10
20
30
2
200
50
60
70
I tried:
select BookingId
, Type1Value = max(case when RN=1 then Value else null end)
, Type2Value = max(case when RN=2 then Value else null end)
, Type3Value = max(case when RN=3 then Value else null end)
from (
select *
, rn = Row_Number() over (Partition By TypeId Order by AddedTime)
from Values_M
) a
where rn <= 3
group by BookingId
This will gives you the required result using conditional case expression.
Using row_number() to generate new running number Id
select Id = row_number() over (order by BookingId),
BookingId = BookingId,
Type1Value = max(case when TypeId = 'T1' then Value end),
Type2Value = max(case when TypeId = 'T2' then Value end),
Type3Value = max(case when TypeId = 'T3' then Value end),
AddedTime = min(AddedTime)
from Values_M
group by BookingId
dbfiddle
select BookingId, min(T1) as Type1Value, min(T2) as Type2Value, min(T3) as Type3Value
from table1
pivot (sum(value) for Typeid in (T1,T2,T3)) as PivotTable
group by BookingId
You can use row_number() and self join.
with cte as (
select Id, BookingId, [Value], TypeId, AddedTime
, row_number() over (partition by BookingId order by id asc) rn
from Values_M
)
select C1.rn, C1.BookingId, C1.[Value] Type1Value, C2.[Value] Type2Value, C3.[Value] Type3Value, C1.AddedTime
from cte C1
inner join cte C2 on C2.BookingId = C1.BookingId and C2.rn = 2
inner join cte C3 on C3.BookingId = C1.BookingId and C3.rn = 3
where C1.rn = 1
order by BookingId asc;

count rows which have max value less than specified parameter

I want to find in my table, max value which is less than specified in parameter and get count of rows that have the same value as max value. For example in my table I have values: (4,1,3,1,4,4,10), and it is list of parameters in string "2,9,10,4". I have to split string to separate parameters. Base on this sample values I want to get something like that:
param | max value | count
2 | 1 | 2
9 | 4 | 3
10 | 4 | 3
4 | 3 | 1
And it is my sample query:
select
[param]
, max([val]) [max_value_by_param]
, max(count) [count]
from(
select
n.value as [param]
,a.val
, count(*) as [count]
from (--mock of table
select 1 as val union all
select 3 as val union all
select 4 as val union all
select 1 as val union all
select 3 as val union all
select 4 as val union all
select 4 as val union all
select 10 as val
) a
join (select [value] from string_split('2,9,10,4', ',')) n--list of params
on a.val < n.[value]
group by n.value, a.val
) tmp
group by [param]
Is it possible to do it better/easier ?
Here is a way to express this using apply:
select s.value as param, a.val, a.cnt
from string_split('2,9,10,4', ',') s outer apply
(select top (1) a.val, count(*) as cnt
from a
group by a.val
having a.val < s.value
order by a.val desc
) a;
Here is a db<>fiddle.
But the fastest method is probably going to be:
with av as (
select a.val, count(*) as cnt
from a
group by a.val
union all
select s.value, null as cnt
from string_split('2,9,10,4', ',') s
)
select val, a_val, a_cnt
from (select av.*,
max(case when cnt is not null then val end) over (order by val, (case when cnt is null then 1 else 2 end)) as a_val,
max(case when cnt is not null then cnt end) over (order by val, (case when cnt is null then 1 else 2 end)) as a_cnt
from av
) av
where cnt is null;
This only aggregates the data once and should return all parameters, even those with no preceding values in a.

How to Generate Row number Partition by two column match in sql

Tbl1
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
2 2-1-18 0 3
3 3-1-18 2 3
4 4-1-18 3< >3
5 5-1-18 2 3
6 6-1-18 0 3
7 7-1-18 1 3
8 8-1-18 0 3
---------------------------------------------------------
I want the result like below
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
5 5-1-18 2 3
---------------------------------------------------------
if ReOrder not same with Qty then date will be same upto after reorder=Qty
You can use cumulative approach with row_number() function :
select top (1) with ties *
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
order by row_number() over(partition by grp order by id);
Unfortunately this will require SQL Server, But you can also do:
select *
from (select *, row_number() over(partition by grp order by id) seq
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
) t
where seq = 1;

Retrieve the minimum value of a column from the max value of another column

I have a hive table t1 that looks like this:
ID Score1 score2
1 4 11
1 5 12
1 5 13
2 3 14
2 3 15
2 2 12
2 2 11
3 6 10
3 6 11
3 6 12
I want for each ID, to select the max value of score1, and if the max value exists more than once, then from the rows that contain max(score1) I want to get min(score2).
So, I want the minimum score2 of the maximum score1 rows, the results should be something like this
ID Score1 score2
1 5 12
2 3 14
3 6 10
Most of the ideas I have turn this to be a very complicated query, and I think there is a simple solution for it that I am not able to find yet.
Any ideas?
Use window functions:
select t.*
from (select t.*,
row_number() over (partition by id order by score1 asc, score2 desc) as seqnum
from t
) t
where seqnum = 1;
Try:
SELECT Z.ID, Z.SCORE1, MIN(SCORE2) AS SCORE2
FROM
(SELECT A.ID, A.SCORE FROM
YOUR_TABLE A
INNER JOIN
(SELECT ID, MAX(SCORE1) FROM YOUR_TABLE GROUP BY ID) B
ON A.ID = B.ID AND A.SCORE1 = B.SCORE1
GROUP BY A.ID, A.SCORE
HAVING COUNT(*)>1
) Z
INNER JOIN YOUR_TABLE C
ON Z.ID = C.ID AND Z.SCORE1 = C.SCORE1
GROUP BY Z.ID, Z.SCORE1;
You can do this with window functions:
SELECT ID, score1, MIN(score2) AS score2
FROM (
SELECT score1, score2, ID
FROM (
SELECT score1, score2, ID
FROM MyTable
QUALIFY RANK OVER(PARTITION BY ID ORDER BY score1 DESC) > 1
) src
QUALIFY COUNT() OVER(PARTITION BY ID) > 1
) src
GROUP BY 1,2
Sorry, writing this from my phone...can't format it well, there may be syntax errors too.
select id, min(score2)
from table1 t
inner join (
select id, max(score1) maxscore1 group by id
) d on t.id = d.id and t.score1 = d.maxscore1
group by t.id
having count(*) > 1 # if the max value exists more than once
an alternative query, if the db does support "analyic functions", is
select
id, min(score2)
from (
select id, score1, score2
, count(case when score1 = max(score1) over(partition by id) then 1 end) count_max
from table1
) d
where count_max > 1 -- if the max value exists more than once
group by
id

selecting the highest count for a categorical variable when grouping

I have the following table:
custID Cat
1 A
1 B
1 B
1 B
1 C
2 A
2 A
2 C
3 B
3 C
4 A
4 C
4 C
4 C
What I need is the most efficient way to aggregate by CustID in such a manner that I obtain the most frequent category (cat), the second most frequent and the third. The output of the above should be
most freq 2nd most freq 3rd most freq
1 B A C
2 A C Null
3 B C Null
4 C A Null
When there is a tie in the count I do not really care what is first and what is second. For example for customer 1 2nd most freq and 3rd most freq could be swapped because each of them occur 1 time only.
Any sql would be fine, preferable hive sql.
Thank you
Try to use group by twice and dense_rank() to sort accorting to the cat count. Actually I'm not 100% sure , but I guess it should work in hive as well.
select custId,
max(case when t.rn = 1 then cat end) as [most freq],
max(case when t.rn = 2 then cat end) as [2nd most freq],
max(case when t.rn = 3 then cat end) as [3th most freq]
from
(
select custId, cat, dense_rank() over (partition by custId order by count(*) desc) rn
from your_table
group by custId, cat
) t
group by custId
demo
According to the comments I add slightly modified solution that conforms with Hive SQL
select custId,
max(case when t.rn = 1 then cat else null end) as most_freq,
max(case when t.rn = 2 then cat else null end) as 2nd_most_freq,
max(case when t.rn = 3 then cat else null end) as 3th_most_freq
from
(
select custId, cat, dense_rank() over (partition by custId order by ct desc) rn
from (
select custId, cat, count(*) ct
from your_table
group by custId, cat
) your_table_with_counts
) t
group by custId
Hive SQL demo
SELECT journal, count(*) as frequency
FROM ${hiveconf:TNHIVE}
WHERE journal IS NOT NULL
GROUP BY journal
ORDER BY frequency DESC
LIMIT 5;