Select top row based on grouping - sql

I think it is a common situation, but I am not able to get the logic.
I have a table as follows.
PersonID SchoolID EndDate
-------- -------- -------
1 ABC 2013
1 DEF 2014
1 GHI 2010
2 XYZ 2013
2 UVW 2011
I want the following output
PersonID SchoolID EndDate
-------- -------- -------
1 DEF 2014
2 XYZ 2013
Basically, I want the latest school for each person. Hence, I try to do something like
SELECT SchoolID, PersonID,EndDate FROM tbl
GROUP BY PersonID
HAVING EndDate = MAX(ENDDATE)
ORDER BY EndDate DESC
But I got an error saying EndDate is invalid in a HAVING clause because it is not contained in an aggregate function or group by clause.
I tried doing this
SELECT SchoolID, PersonID,MAX(EndDate) FROM tbl
GROUP BY PersonID
ORDER BY EndDate DESC
I get an error saying SchoolID is invalid in the select list because of the same reason.
What am I missing here?

with cte as (SELECT *,
ROW_NUMBER() OVER(PARTITION BY PersonID
ORDER BY EndDate DESC) AS RN
FROM Table1)
select PersonId, SchoolId, EndDate from cte
where RN = 1
see SqlFiddle

You have to wrap MAX(Date) in a subquery.
SELECT SchoolID, PersonID, Date
FROM table1 t
WHERE Date =
(SELECT MAX(Date) FROM table1
WHERE PersonID = t.PersonID);
Note: this will give multiple rows for one PersonID if there are multiple dates tied for the max.

with temp as
(
SELECT PersonID,MAX(EndDate) as enddate FROM TABLE
GROUP BY PersonID
)
select TABLE.* from TABLE inner join temp on TABLE.personid=temp.personid
and TABLE.enddate=temp.enddate;

Related

Delete duplicated record

I have a table which contains a lot of duplicated rows like this:
id_emp id date ch_in ch_out
1 34103 2019-09-01
1 34193 2019-09-01 17:00
1 34194 2019-09-02 07:03:21 16:59:26
1 34104 2019-09-02 07:03:21 16:59:26
1 33361 2019-09-02 NULL NULL
I want just one row for each date and others must delete with condition like I want the output must be:
id_emp id date ch_in ch_out
1 34193 2019-09-01 17:00
1 34104 2019-09-02 07:03:21 16:59:26
I tried to use distinct but nothing working:
select distinct id_emp, id, date_1, ch_in,ch_out
from ch_inout
where id_emp=1 order by date_1 asc
And I tried too using this query to delete:
select *
from (
select *, rn=row_number() over (partition by date_1 order by id)
from ch_inout
) x
where rn > 1;
But nothing is working the result is empty.
You can use aggregation:
select id_emp, max(id) as id, date, min(ch_in), max(ch_out)
from ch_inout
group by id_emp, date;
This returns the maximum id for each group of rows. That is not exactly what is returned in the question, but you don't specify the logic.
EDIT:
If you want to delete all but the largest id for each id_emp/date combination, you can use:
delete c from ch_inout c
where id < (select max(c2.id)
from ch_inout c2
where c2.id_emp = c.id_emp and c2.date = c.date
);
You can use ROW_NUMBER() to identify the records you want to delete. Assuming that you want to keep the record with the lowest id on each date:
SELECT *
FROM (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
) x
WHERE rn > 1
You can easily turn this into a DELETE statement:
WITH cte AS (
SELECT
t.*,
ROW_NUMBER() OVER(PARTITION BY date ORDER BY id) rn
FROM ch_inout t
)
DELETE FROM cte WHERE rn > 1

How to group by one column, aggregate by another column and get another column as result in postgresql?

This seems something simple, but couldn't find an answer for this question last few hours.
I have a table request_state, where "id" is primary key, it can have multiple entries with same state_id. I want to get the id after grouping by state_id using max datetime.
So I tried this, but it gives error "state_id" must appear in the GROUP BY clause or be used in an aggregate function
select id, state_id, max(datetime)
from request_state
group by id
but when I use following query, I get multiple entries with same state_id.
select id, state_id, max(datetime)
from request_state
group by id, state_id
My table:
id state_id date_time
cef 1 Jan 1
ter 1 Jan 2
ijk 1 Jan 3
uuu 2 Feb 1
rrr 2 Feb 2
This is what I want as my result,
id state_id date_time
__ ________ _________
ijk 1 Jan 3
rrr 2 Feb 2
You seem to want:
select max(id) as id, state_id, max(datetime)
from request_state
group by state_id;
If you want the row where datetime is maximum for each state, then use distinct on:
select distinct on (state) rs.*
from request_state rs
order by state, datetime desc;
Try this query:
select id, state_id, date_time from (
select id, state_id, date_time,
row_number() over (partition by state_id order by date_time desc) rn
from tbl
) a where rn = 1
You can use correlated suqbuery :
select t.*
from table t
where date_time = (select max(date_time) from table t1 where t1.state_id = t.state_id);

SQL: Select id from table grouping by max year and name

I have the following data:
ID Year Name
1 2016 A
2 2015 A
3 2014 A
4 2014 B
5 2015 B
6 2010 C
7 2007 D
8 2008 D
9 2006 D
I need to query just the ID of the max date for each name group
Result: [1, 5, 6, 8 ]
which is really:
ID Year Name
1 2016 A
5 2015 B
6 2010 C
8 2008 D
I have the following, but don't know where to go from here
SELECT MAX(year) from table GROUP BY name
Ideally there should be no duplicate year and name groups, but if the there are duplicate records, then its possible. Since they would be duplicates, it would not matter which to keep.
If you want one row per name then I would recommend distinct on:
select distinct on (name) t.*
from t
order by name, year desc;
If you have duplicates, then one solution is rank():
select id, year, name
from (select t.*, rank() over (partition by name order by year desc) as seqnum
from t
) t
where seqnum = 1;
One could use row_number() analytic partitioned by name and ordered by year and ID desc to get the max ID for the max date. You've not indicated if ties exist what you'd like to see... but this would return one of them (the one with the highest ID.)
SELECT *
FROM (SELECT ID
, year
, Name
, row_number() over (PARTITION BY Name ORDER BY Year Desc, ID Desc) RN
FROM tbl) Z
WHERE RN = 1
An alternative way to accomplish this is to use your query as a inline view and simply join it back to the base set.
SELECT *
FROM tbl A
INNER JOIN (SELECT max(year) mYear, name
FROM tbl
GROUP BY name) B
on A.year = B.myear
and A.Name = B.Name
Ties will be displayed. So if you have a name with two records having a max year of 2016, then both records would be returned.

SQL Server get latest value by date

I have a SQL Server table that has project_id int,update_date datetime ,update_text varchar(max)
The table has many updates per project_id. I need to fetch the latest by update_date for all project_id values.
Example:
project_id update_date update_text
1 2017/01/01 Happy new year.
2 2017/01/01 Nice update
2 2017/02/14 Happy Valentine's
3 2016/12/25 Merry Christmas
3 2017/01/01 A New year is a good thing
The query should get:
project_id update_date update_text
1 2017/01/01 Happy new year.
2 2017/02/14 Happy Valentine's
3 2017/01/01 A New year is a good thing
using top with ties with row_number()
select top 1 with ties
project_id, update_date, update_text
from projects
order by row_number() over (partition by project_id order by update_date desc)
rextester demo: http://rextester.com/MGUNU86353
returns:
+------------+-------------+----------------------------+
| project_id | update_date | update_text |
+------------+-------------+----------------------------+
| 1 | 2017-01-01 | Happy new year. |
| 2 | 2017-02-14 | Happy Valentine's |
| 3 | 2017-01-01 | A New year is a good thing |
+------------+-------------+----------------------------+
This query will give you the latest date for each project
Select Project_Id, Max (Update_Date) Max_Update_Date
From MyTable
Group By Project_Id
So join it back to the original table
Select Project_Id, Update_Date, Update_Text
From MyTable
Inner Join
(
Select Project_Id, Max (Update_Date) Max_Update_Date
From MyTable
Group By Project_Id
) MaxDates
On MyTable.Project_Id = MaxDates.Project_Id
And MyTable.Update_Date = MaxDates.Max_Update_Date
You can find the MAX(date) like:
SELECT * FROM [table]
INNER JOIN (SELECT project_id, date = MAX(update_date) FROM [table] GROUP BY project_id) AS a
ON [table].project_id = a.project_id AND update_date = date
or you can use ROW_NUMBER() like:
SELECT * FROM (
SELECT *, rownum = ROW_NUMBER() OVER (PARTITION BY project_id ORDER BY
update_date DESC) FROM [table]
) AS a WHERE rownum = 1
Note to top answer:
Whilst some may say it is the best answer it will often not be the most efficient query time. In my data the following example is an order of magnitude faster.
SELECT
project_id, update_date, update_text
FROM
projects P
WHERE
update_date = (SELECT MAX(update_date) FROM projects WHERE project_id = P.project_id)
(If you can have more than one update on a given date then would need max on update_text and group by as in this specific example you would not know which update_text value was valid.)
You can do it as:
with CTE as(
SELECT project_id, MAX(update_date) update_date
FROM YourTable
GROUP BY project_id
)
SELECT cte.project_id, cte.update_date , max(t.update_text) update_text
FROM YourTable T inner join CTE on T.project_id = CTE.project_id
group by cte.project_id, cte.update_date ;
Demo.

Group BY Having COUNT, but Order on a column not contained in group

I have a table where I need to get the ID, for a group(based on ID and Name) with a COUNT(*) = 3, for the latest set of timestamps.
So for example below, I want to retrieve ID 2. As it has 3 rows, and the latest timestamps (even though ID 3 has latest timestamps overall, it doesn't have a count of 3).
But I don't understand how to order by Date, as I cannot contain it in the Group By clause, as it is not the same:
SELECT TOP 1 ID
FROM TABLE
GROUP BY ID,Name
HAVING COUNT(ID) > 2
AND Name = 'ABC'
--ORDER BY Date DESC
Sample Data
ID Name Date
1 ABC 2015-05-27 08:00
1 ABC 2015-05-27 09:00
1 ABC 2015-05-27 10:00
2 ABC 2015-05-27 11:00
2 ABC 2015-05-27 12:00
2 ABC 2015-05-27 13:00
3 ABC 2015-05-27 14:00
3 ABC 2015-05-27 15:00
In SQL server, you need aggregate the columns not on group by list:
SELECT TOP 1 ID
FROM TABLE
WHERE Name = 'ABC'
GROUP BY ID,Name
HAVING COUNT(ID) > 2
ORDER BY MAX(Date) DESC
The name filter should be put before the group by for better performance, if you really need it.
You could do it in a nested query.
Subquery:
SELECT ID
from TABLE
GROUP BY ID
HAVING Count(ID) > 2
That gives you the IDs you want. Put that in another query:
SELECT ID, Data
FROM Table
Where ID in (Subquery)
Order by Date DESC;
First get all desired IDs. That is all IDs having a count > 2. Get the maximum date for each such ID. Then rank these records with ROW_NUMBER, giving the latest ID #1. At last remove all IDs that are not ranked #1.
select name, id
from
(
select
name, id, row_count() over (partition by name order by max_date desc) as rn
from
(
select name, id, max(date) as max_date
from mytable
--where name = 'ABC'
group by name, id
having count(*) > 2
) wanted_ids
) ranked_ids
where rn = 1;