SQL select 1 row out of several rows that have similar values - sql

I have a table like this:
ID
OtherID
Date
1
z
2022-09-19
1
b
2021-04-05
2
e
2022-04-05
3
t
2022-07-08
3
z
2021-03-02
I want a table like this:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
That have distinct pairs consisted of ID-OtherID based on the Date values which are the most recent.
The problem I have now is the relationship between ID and OtherID is 1:M
I've looked at SELECT DISTINCT, GROUP BY, LAG but I couldn't figure it out. I'm sorry if this is a duplicate question. I couldn't find the right keywords to search for the answer.
Update: I use Postgres but would like to know other SQL as well.

This works for many dbms (versions of postgres, mysql and others) but you may need to adapt if something else. You could use a CTE, or a join, or a subquery such as this:
select id, otherid, date
from (
select id, otherid, date,
rank() over (partition by id order by date desc) as id_rank
from my_table
)z
where id_rank = 1
id
otherid
date
1
z
2022-09-19T00:00:00.000Z
2
e
2022-04-05T00:00:00.000Z
3
t
2022-07-08T00:00:00.000Z

You can use a Common Table Expression (CTE) with ROW_NUMBER() to assign a row number based on the ID column (then return the first row for each ID in the WHERE clause rn = 1):
WITH cte AS
(SELECT ID,
OtherID,
Date,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Date DESC) AS rn
FROM sample_table)
SELECT ID,
OtherID,
Date
FROM cte
WHERE rn = 1;
Result:
ID
OtherID
Date
1
z
2022-09-19
2
e
2022-04-05
3
t
2022-07-08
Fiddle here.

Related

Average and sort by this based on other conditional columns in a table

I have a table in SQL Server 2017 like below:
Name Rank1 Rank2 Rank3 Rank4
Jack null 1 1 3
Mark null 3 2 2
John null 2 3 1
What I need to do is to add an average rank column then rank those names based on those scores. We ignore null ranks. Expected output:
Name Rank1 Rank2 Rank3 Rank4 AvgRank FinalRank
Jack null 1 1 3 1.66 1
Mark null 3 2 2 2.33 3
John null 2 3 1 2 2
My query now looks like this:
;with cte as (
select *, AvgRank= (Rank1+Rank2+Rank3+Rank4)/#NumOfRankedBy
from mytable
)
select *, FinakRank= row_number() over (order by AvgRank)
from cte
I am stuck at finding the value of #NumOfRankedBy, which should be 3 in our case because Rank1 is null for all.
What is the best way to approach such an issue?
Thanks.
Your conumdrum stems from the fact your table in not normalised and you are treating data (Rank) as structure (columns).
You should have a table for Ranks where each rank is a row, then your query is easy.
You can unpivot your columns into rows and then make use of avg
select *, FinakRank = row_number() over (order by AvgRank)
from mytable
cross apply (
select Avg(r * 1.0) AvgRank
from (values(rank1),(rank2),(rank3),(rank4))r(r)
)r;

Get certain rows, plus rows before and after

Let's say I have the following data set:
ID
Identifier
Admission_Date
Release_Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
234
2
4/15/22
4/18/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
789
2
7/1/22
7/5/22
321
2
6/1/21
6/3/21
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
321
2
5/6/21
5/10/21
I want all rows with identifier=1. I also want rows that are either directly below or above rows with Identifier=1 - sorted by most recent to least recent.
There is always a row below rows with identifier=1. There may or may not be a row above. If there is no row with identifier=1 for an ID, then it will not be brought in with a prior step.
The resulting data set should be as follows:
ID
Identifier
Admission Date
Release Date
234
2
5/1/22
5/5/22
234
1
4/25/22
4/30/22
234
2
4/20/22
4/24/22
789
1
7/15/22
7/19/22
789
2
7/8/22
7/14/22
321
2
5/27/21
5/31/21
321
1
5/20/21
5/26/21
321
2
5/15/21
5/19/21
I am using DBeaver, which runs PostgreSQL.
I admittedly don't know Postgres well so the following could possibly be optimised, however using a combination of lag and lead to obtain the previous and next dates (assuming Admission_date is the one to order by) you could try
with d as (
select *,
case when identifier = 1 then Lag(admission_date) over(partition by id order by Admission_Date desc) end pd,
case when identifier = 1 then Lead(admission_date) over(partition by id order by Admission_Date desc) end nd
from t
)
select id, Identifier, Admission_Date, Release_Date
from d
where identifier = 1
or exists (
select * from d d2
where d2.id = d.id
and (d.Admission_Date = pd or d.admission_date = nd)
)
order by Id, Admission_Date desc;
One way:
SELECT (x.my_row).* -- decompose fields from row type
FROM (
SELECT identifier
, lag(t) OVER w AS t0 -- take whole row
, t AS t1
, lead(t) OVER w AS t2
FROM tbl t
WINDOW w AS (PARTITION BY id ORDER BY admission_date)
) sub
CROSS JOIN LATERAL (
VALUES (t0), (t1), (t2) -- pivot
) x(my_row)
WHERE sub.identifier = 1
AND (x.my_row).id IS NOT NULL; -- exclude rows with NULL ( = missing row)
db<>fiddle here
The query is designed to only make a single pass over the table.
Uses some advanced SQL / Postgres features.
About LATERAL:
What is the difference between a LATERAL JOIN and a subquery in PostgreSQL?
About the VALUES expression:
Postgres: convert single row to multiple rows (unpivot)
The manual about extracting fields from a composite type.
If there are many rows per id, other solutions will be (much) faster - with proper index support. You did not specify ...

How to select the last value which is not null?

I have the following table:
id a b
1 1 kate
1 4 null
1 3 paul
1 3 paul
1 2 lola
2 1 kim
2 9 null
2 2 null
In result it should be this:
1 3 paul
2 1 kim
I want to get the last a where b is not null. Something like:
select b
from (select,b
row_num() over (partition by id order by a desc) as num) as f
where num = 1
But this way I get a null value, because to the last a = 4 corresponds to b IS NULL. Maybe there is a way to rewrite ffill method from pandas?
Assuming:
a is defined NOT NULL.
You want the row with the greatest a where b IS NOT NULL - per id.
SELECT DISTINCT ON (id) *
FROM tbl
WHERE b IS NOT NULL
ORDER BY id, a DESC;
db<>fiddle here
Detailed explanation:
Select first row in each GROUP BY group?
Try:
select id, a, b
from (select id, a, b,
row_num() over (partition by id order by a desc nulls last) as num
from unnamedTable) t
where num = 1
Or, if that isn't right, try it with nulls first. I can never remember which way it works with desc.
If you aren't guaranteed to have at least one non-null per id then you'll want to move nulls to the bottom of the list rather than filtering those rows out entirely.
select id, a, b
from (
select id, a, b,
row_number() over (
partition by id
order by case when b is not null then 0 else 1 end, a desc
) as num
) as f
where num = 1
You can wrap this around a cte and join it back to the main table if you wish to keep the original columns as is, but looking at your expected output and logic, this should do it. Having said that, row_number() based approach might be a tad faster.
select distinct
id,
max(a) over (partition by id) as a,
first_value(b) over (partition by id order by a desc) as b
from tbl
where b is not null;

How to get MAX Hike in Min month?

below is table:
Name | Hike% | Month
------------------------
A 7 1
A 6 2
A 8 3
b 4 1
b 7 2
b 7 3
Result should be:
Name | Hike% | Month
------------------------
A 8 3
b 7 2
Here is one way of doing this:
SELECT Name, [Hike%], Month
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY [Hike%] DESC, Month) rn
FROM yourTable
) t
WHERE rn = 1
ORDER BY Name;
If you instead want to return multiple records per name, in the case where two or more records might be tied for having the greatest hike%, then replace ROW_NUMBER with RANK.
use correlated subquery
select Name,min(Hike) as Hike,min(Month) as Month
from
(
select * from tablename a
where Hike in (select max(Hike) from tablename b where a.name=b.name)
)A group by Name
You can use something similar to the below:
SELECT Name, MAX(Hike), Month
FROM table
GROUP BY Name, Month
Hope this helps :)

MAX function without group by

I have the following table:
ID | NUM
1 | 4
2 | 9
3 | 1
4 | 7
5 | 10
I want a result of:
ID | NUM
5 | 10
When I try to use MAX(NUM) I get and error that I have to use GROUP BY in order to use MAX function
Any idea?
As per the error, use of an aggregate like Max requires a Group By clause if there are any non-aggregated columns in the select list (In your case, you are trying to find the MAX(Num) and then return the value(s) associated in the ID column). In MS SQL Server you can get what you want via ordering and limiting the returned rows:
SELECT TOP 1 ID, NUM
FROM [table]
ORDER BY NUM DESC;
In other RDBMS systems the LIMIT offers similar functionality.
Edit
If you need to return all rows which have the same maximum, then use the WITH TIES qualification:
SELECT TOP 1 WITH TIES ID, NUM
FROM [table]
ORDER BY NUM DESC;
May return more than 1 result:
SELECT id, num
FROM table
WHERE num = (SELECT MAX(num) FROM table)
Try this query.
WITH result AS
(
select DENSE_RANK() OVER( ORDER BY NUM desc) AS RowNo,ID,NUM from #emp
)
select ID,NUM from result where RowNo=1
it will return max values even if it has more MAX values like:
ID | NUM
5 | 10
6 | 10
refer below link to know more about RANKING Functions:
http://msdn.microsoft.com/en-us/library/ms189798
How about:
SELECT TOP 1 ID,NUM FROM table ORDER BY NUM DESC;
Do this -
SELECT TOP 1 ID,
NUM
FROM <yourtable>
ORDER BY NUM DESC;
Get all rows have max values but THERE ARE 3 SELECT, It's not good for performance
SELECT id, MAX(num) as num
FROM table
GROUP BY id
ORDER BY MAX(num) DESC
LIMIT (SELECT COUNT(*)
FROM table
WHERE num =(SELECT MAX(num) FROM table)
)