Keep minimum value of field A and corresponding value of field B in SQL server - sql

Say I have the following table:
Category rank score
a 3 100
a 1 105
a 2 110
b 2 102
b 7 107
b 3 95
I would like to know both the most efficient and the most visually elegant way of getting the lines having the minimum rank for each category.
In my example the result would be
Category rank score
a 1 105
b 2 102
The solutions I came up with seem inefficient and ugly for something that seems quite straightforward.

A typical solution is to use row_number():
select category, rank, score
from (select t.*,
row_number() over (partition by category order by rank) as seqnum
from t
) t
where seqnum = 1;
Whether or not you think this is elegant is a matter of opinion.

Below solution uses the concept of CTE....
with cte as
(
select category, rank, score, ROW_NUMBER() OVER(PARTITION BY category ORDER BY rank ) AS row_num
from t
)
select category, rank, score from cte
where row_num=1

Related

SQL select 1 row out of several rows that have similar values

I have a table like this:
ID
OtherID
Date
1
z
2022-09-19
1
b
2021-04-05
2
e
2022-04-05
3
t
2022-07-08
3
z
2021-03-02
I want a table like this:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
That have distinct pairs consisted of ID-OtherID based on the Date values which are the most recent.
The problem I have now is the relationship between ID and OtherID is 1:M
I've looked at SELECT DISTINCT, GROUP BY, LAG but I couldn't figure it out. I'm sorry if this is a duplicate question. I couldn't find the right keywords to search for the answer.
Update: I use Postgres but would like to know other SQL as well.
This works for many dbms (versions of postgres, mysql and others) but you may need to adapt if something else. You could use a CTE, or a join, or a subquery such as this:
select id, otherid, date
from (
select id, otherid, date,
rank() over (partition by id order by date desc) as id_rank
from my_table
)z
where id_rank = 1
id
otherid
date
1
z
2022-09-19T00:00:00.000Z
2
e
2022-04-05T00:00:00.000Z
3
t
2022-07-08T00:00:00.000Z
You can use a Common Table Expression (CTE) with ROW_NUMBER() to assign a row number based on the ID column (then return the first row for each ID in the WHERE clause rn = 1):
WITH cte AS
(SELECT ID,
OtherID,
Date,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC) AS rn
FROM sample_table)
SELECT ID,
OtherID,
Date
FROM cte
WHERE rn = 1;
Result:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
Fiddle here.

SQL Getting row number only when the value is different from all previous values

I want the count adding one only when the value has not been show before. The base table is:
rownum product
1 coke
2 coke
3 burger
4 burger
5 chocolate
6 apple
7 coke
8 burger
The goal is:
rownum product
1 coke
1 coke
2 burger
2 burger
3 chocolate
4 apple
4 coke
4 burger
I am thinking to compare the current row with all previous rows, but I have difficulty to call all previous rows. Thank you!
This is a gaps-and-islands problem. Here is one approach using window functions: the idea is to use a window sum that increments everytime the "first" occurence of a product is seen:
select t.*,
sum(case when rn = 1 then 1 else 0 end) over(order by rownum) new_rownum
from(
select t.*, row_number() over(partition by product order by rownum) rn
from mytable t
) t
order by rownum
Several different ways to accomplish this. I guess you'll get to pick one you like the best. This one just finds the first row number per product. You then just need to collapse the holes with an easy application of dense_rank() to the initial grouping.
with data as (
select *, min(rownum) over (partition by product) as minrow
from T
)
select dense_rank() over (order by minrow) as rownum, product
from data
order by rownum, data.rownum;

group by in SQL, improvement of query

I am facing one small issue.
I have a table MY_CHART_TABLE(ID,REASON_CODE,QUANTITY)
101 CompFail 57
101 FitFinish 18
101 CompDamage 16
102 NoFail 57
102 NoFinish 18
103 FullDamage 16
output I want
101 CompFail 57 3
101 FitFinish 18 3
101 CompDamage 16 3
102 NoFail 57 2
102 NoFinish 18 2
103 FullDamage 16 1
I need to store count at the end based on id. how can I do it?
I am using below query
SELECT
id,
reason_code,
quantity,
COUNT(*) OVER (PARTITION BY id )
FROM MY_CHART_TABLE;
Is there any better way to improve the query? can it be done using group by?
You code is fine as is, but if you must use group by, you can use a derived table to obtain the counts before you join it back to the main table.
select a.*, b.counts
from MY_CHART_TABLE a
join (select id, count(*) as counts
from MY_CHART_TABLE
group by id) b on a.id=b.id;
For your problem, I would reiterate that your code is the cleaner and better way of doing this. Note that your code as is will count all the rows per id. If you only need to count distinct rows per id, you can add a group by. Window functions are applied after group by, so this way- you'll be counting only distinct rows per id
select id, reason_code, quantity, count(*) over (partition by id)
from MY_CHART_TABLE
group by id, reason_code, quantity;
You can use CTEs with group by
;with cnts as(
select
id,
[count] = count(*)
from MY_CHART_TABLE
group by id
)
select mct.*, cnts.[count]
from MY_CHART_TABLE mct
join cnts
on cnts.id = mct.id

Get specified row ranking number

Here is the rows looks like:
Id Gold
1 200
2 100
3 300
4 900
5 800
6 1000
What I want to achieve is getting the rank number whose Id equals to 5, which is order by Gold descending.
So after ordering, the intermediate rows should be(NOT RETURN):
Id Gold
6 1000
4 900
5 800
And the SQL should just return 3, which is the ranking of Id = 5 row.
What is the most efficient way to achieve this?
You simply want top, I think:
select top 3 t.*
from t
order by gold desc;
If you want the ranking of id = 5:
select count(*)
from t
where t.gold >= (select t2.gold from t t2 where t2.id = 5);
Try This Code By using Dense_rank():
WITH cte
AS (SELECT *,
Dense_rank()
OVER(
ORDER BY [Gold] DESC) AS rank
FROM your_table)
SELECT rank
FROM cte
WHERE id = 5

Creating a Rank Column with Repeated Indexes

I want to output the following table:
User | Country | RANK
------------------------------
1 US 3
1 US 3
1 NZ 2
1 NZ 2
1 NZ 2
1 JP 1
2 US 2
2 US 2
2 US 2
2 CA 1
What I have is the 'User' and 'Country' columns and want to create the RANK column.
I tried to use the function rank() like
rank() over (partition by User, Country order by ct desc) where ct is just the time of the event since epoch but instead of giving some repeated numbers like 33 222 1, it ranks inside the partition, giving me 12 123 1.
I also tried row_number() with no success.
If I use rank() over (partition by User order by country desc) it works, but how can I guarantee that it also ranks by ct?
Any clues on how to do that?
You are quite vague about the schema of your data. But assuming you have data that looks like this:
User Country Unix_time(epoch)
1 US 1437888888
1 NZ 1437666666
2 US 1437777777
2 NZ 1435555555
I think this will work but I can't test as I don't have hive on my laptop.
select c.*, b.rank
from my_table c
left outer join
(select user
, country
, rank() over (partition by user, order by unix_time desc) as rank
from
(select user, country, max(unix_time) as unix_time
from my_table group by user, country
) a
) b
on c.user=b.user and c.country=b.country
;
Basically I am selecting the maximum value for the time stamp associated with each user and country. This can then be ranked and joined to the original dataset.