Group BY Having COUNT, but Order on a column not contained in group - sql

I have a table where I need to get the ID, for a group(based on ID and Name) with a COUNT(*) = 3, for the latest set of timestamps.
So for example below, I want to retrieve ID 2. As it has 3 rows, and the latest timestamps (even though ID 3 has latest timestamps overall, it doesn't have a count of 3).
But I don't understand how to order by Date, as I cannot contain it in the Group By clause, as it is not the same:
SELECT TOP 1 ID
FROM TABLE
GROUP BY ID,Name
HAVING COUNT(ID) > 2
AND Name = 'ABC'
--ORDER BY Date DESC
Sample Data
ID Name Date
1 ABC 2015-05-27 08:00
1 ABC 2015-05-27 09:00
1 ABC 2015-05-27 10:00
2 ABC 2015-05-27 11:00
2 ABC 2015-05-27 12:00
2 ABC 2015-05-27 13:00
3 ABC 2015-05-27 14:00
3 ABC 2015-05-27 15:00

In SQL server, you need aggregate the columns not on group by list:
SELECT TOP 1 ID
FROM TABLE
WHERE Name = 'ABC'
GROUP BY ID,Name
HAVING COUNT(ID) > 2
ORDER BY MAX(Date) DESC
The name filter should be put before the group by for better performance, if you really need it.

You could do it in a nested query.
Subquery:
SELECT ID
from TABLE
GROUP BY ID
HAVING Count(ID) > 2
That gives you the IDs you want. Put that in another query:
SELECT ID, Data
FROM Table
Where ID in (Subquery)
Order by Date DESC;

First get all desired IDs. That is all IDs having a count > 2. Get the maximum date for each such ID. Then rank these records with ROW_NUMBER, giving the latest ID #1. At last remove all IDs that are not ranked #1.
select name, id
from
(
select
name, id, row_count() over (partition by name order by max_date desc) as rn
from
(
select name, id, max(date) as max_date
from mytable
--where name = 'ABC'
group by name, id
having count(*) > 2
) wanted_ids
) ranked_ids
where rn = 1;

Related

Alternate SQL Server query by performance?

Query which I am using:
select SUM(marks)
from Table1
where name = ?
and Date = (select top 1 Date
from Table1
where name =?
and Date < ?
order by Date desc)
Table1:
id
name
marks
Date
1
abc
34
01/01/2021
2
abc
15
05/01/2021
3
abc
20
05/01/2021
4
def
34
05/01/2021
5
abc
12
10/01/2021
select sum(marks)
from Table1
where name ='abc'
and Date = (select top 1 Date
from Table1
where name = 'abc'
and Date < 10/01/2021
order by Date desc)
Result 35
Using RANK() would take comparatively less time:
select sum(marks)
from
(
select *, rank()OVER(order by date desc) as rnk
from table1
where name ='abc' and Date < '10/01/2021'
) as we
where rnk=1
Result: 35
Explanation:
Your query is using sub-query in WHERE clause which will check for each and every condition and you are filtering for name abc 2 times. Alternatively I am doing it once and feeding subquery in FROM clause that significantly saves time.
Look at the demo here with time elapsed (have made some additional dummy data to check time)

Oracle SQL: receive ID of grouped foreign key with smallest Date

I have a table given.
I need the ID of each BID with the smallest MODIFIED date
ID
BID
MODIFIED
1
1
01.01.2020
2
1
01.07.2020
3
2
04.08.2020
4
2
04.06.2020
5
2
01.07.2020
6
2
01.10.2020
7
3
01.09.2020
Desired output:
ID
BID
MODIFIED
1
1
01.01.2020
4
2
04.06.2020
7
3
01.09.2020
so far, I can get a list of BIDs with the smallest MODIFIED date, but not the ID from it:
select BID, min(MODIFIED) from MY_TABLE group by BID
how can I receive the ID, however?
Oracle has a "first" aggregation function, which uses the keep syntax:
select BID, min(MODIFIED),
min(id) keep (dense_rank first over order by modified) as id
from MY_TABLE
group by BID;
A common alternative uses window functions:
select t.*
from (select t.*,
row_number() over (partition by bid order by modified asc) as seqnum
from my_table t
) t
where seqnum = 1;

Update a column based on other rows column value

I have a table t which looks like this
key fill store end_date status
1 123 1 2019-04-30 0
2 1234 1 2019-04-30 0
3 123 1 2019-05-01 0
Now I need to update the first record and set status=1 as the third record has same fill, store value and it is latest.
Output:
key fill store end_date status
1 123 1 2019-04-30 1
2 1234 1 2019-04-30 0
3 123 1 2019-05-01 0
I tried calculating row_number and tried to update the column based on it but unable to figure out how to use the result in the update clause.
update t set
status = 1
from (
select *
from (
select *
, row_number() over (partition by fill, store order by end_dt desc) as row_num from t
) a
where row_num = 2
) b
This query is updating all the records, what should change in my query to get the expected result?
I think that you want:
with cte as (
select status, row_number() over(partition by fill, store order by end_date desc) rn
from t
)
update cte set status = 1 where rn > 1
In the common table expression, row_number() ranks records having the same fill and store by descending end_date. Then, the outer query sets status to 1 on rows that were not ranked first.
You can do a correlated subquery:
update my_table a
set status = 1
where exists (
select 1
from my_table b
where b.fill = a.fill
and b.store = a.store
and b.end_date > a.end_date
)

How to group by one column, aggregate by another column and get another column as result in postgresql?

This seems something simple, but couldn't find an answer for this question last few hours.
I have a table request_state, where "id" is primary key, it can have multiple entries with same state_id. I want to get the id after grouping by state_id using max datetime.
So I tried this, but it gives error "state_id" must appear in the GROUP BY clause or be used in an aggregate function
select id, state_id, max(datetime)
from request_state
group by id
but when I use following query, I get multiple entries with same state_id.
select id, state_id, max(datetime)
from request_state
group by id, state_id
My table:
id state_id date_time
cef 1 Jan 1
ter 1 Jan 2
ijk 1 Jan 3
uuu 2 Feb 1
rrr 2 Feb 2
This is what I want as my result,
id state_id date_time
__ ________ _________
ijk 1 Jan 3
rrr 2 Feb 2
You seem to want:
select max(id) as id, state_id, max(datetime)
from request_state
group by state_id;
If you want the row where datetime is maximum for each state, then use distinct on:
select distinct on (state) rs.*
from request_state rs
order by state, datetime desc;
Try this query:
select id, state_id, date_time from (
select id, state_id, date_time,
row_number() over (partition by state_id order by date_time desc) rn
from tbl
) a where rn = 1
You can use correlated suqbuery :
select t.*
from table t
where date_time = (select max(date_time) from table t1 where t1.state_id = t.state_id);

selecting set of second lowest values

I have two columns of interest ID and Deadline:
ID Deadline (DD/MM/YYYY)
1 01/01/2017
1 05/01/2017
1 04/01/2017
2 02/01/2017
2 03/01/2017
2 06/02/2017
2 08/03/2017
Each ID can have multiple (n) deadlines. I need to select all rows where the Deadline is second lowest for each individual ID.
Desired output:
ID Deadline (DD/MM/YYYY)
1 04/01/2017
2 03/01/2017
Selecting minimum can be done by:
select min(deadline) from XXX group by ID
but I am lost with "middle" values. I am using Rpostgresql, but any idea helps as well.
Thanks for your help
One way is to use ROW_NUMBER() window function
SELECT id, deadline
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY deadline) rn
FROM xxx
) q
WHERE rn = 2 -- get only second lowest ones
or with LATERAL
SELECT t.*
FROM (
SELECT DISTINCT id FROM xxx
) i JOIN LATERAL (
SELECT *
FROM xxx
WHERE id = i.id
ORDER BY deadline
OFFSET 1 LIMIT 1
) t ON (TRUE)
Output:
id | deadline
----+------------
1 | 2017-04-01
2 | 2017-03-01
Here is a dbfiddle demo
Using ROW_NUMBER() after taking distinct records will eliminate the chance of getting the lowest date instead of second lowest if there are duplicate records.
select ID,Deadline
from (
select ID,
Deadline,
ROW_NUMBER() over(partition by ID order by Deadline) RowNum
from (select distinct ID, Deadline from SourceTable) T
) Tbl
where RowNum = 2