Find minimum value in groups of rows

Find minimum value in groups of rows - sql

In the SQL space (specifically T-SQL, SQL Server 2008), given this list of values:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060
ACT 2012-01-07 11:51:07.920
ACT 2012-01-08 04:13:29.140
NOS 2012-01-09 04:29:16.873
ACT 2012-01-21 12:39:37.607 <-- THIS
ACT 2012-01-21 12:40:03.840
ACT 2012-05-02 16:27:17.370
GRAD 2012-05-19 13:30:02.503
GRAD 2013-09-03 22:58:48.750
Generated from this query:
SELECT Status, Date
FROM Account_History
WHERE AccountNumber = '1234'
ORDER BY Date
The status for this particular object started at ACT, then changed to NOS, then back to ACT, then to GRAD.
What is the best way to get the minimum date from the latest "group" of records where Status = 'ACT'?

Here is a query that does this, by identifying the groups where the student statuses are the same and then using simple aggregation:
select top 1 StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged,
(row_number() over (order by "date") -
row_number() over (partition by studentstatus order by "date)
) as grp
FROM Account_History
WHERE AccountNumber = '1234'
) t
where StudentStatus = 'ACT'
group by StudentStatus, grp
order by WhenLastChanged desc;
The row_number() function assigns sequential numbers within groups of rows based on the date. For your data, the two row_numbers() and their difference is:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060 1 1 0
ACT 2012-01-07 11:51:07.920 2 2 0
ACT 2012-01-08 04:13:29.140 3 3 0
NOS 2012-01-09 04:29:16.873 4 1 3
ACT 2012-01-21 12:39:37.607 5 4 1
ACT 2012-01-21 12:40:03.840 6 5 1
ACT 2012-05-02 16:27:17.370 7 6 1
GRAD 2012-05-19 13:30:02.503 8 1 7
GRAD 2013-09-03 22:58:48.750 9 2 7
Notice the last row is constant for rows that have the same status.
The aggregation brings these together and chooses the latest (top 1 . . . order by date desc) of the first dates (min(date)).
EDIT:
The query is easy to tweak for multiple account numbers. I probably should have written that way to begin with, except the final selection is trickier. The results from this has the date for each status and account:
select StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged, AccountNumber
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
order by WhenLastChanged desc;
But you can't get the last one per account quite so easily. Another level of subqueries:
select AccountNumber, StudentStatus, WhenLastChanged
from (select AccountNumber, StudentStatus, min(WhenLastChanged) as WhenLastChanged,
row_number() over (partition by AccountNumber, StudentStatus order by min(WhenLastChanged) desc
) as seqnum
from (SELECT AccountNumber, StudentStatus, WhenLastChanged,
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
) t
where seqnum = 1;
This uses aggregation along with the window function row_number(). This is assigning sequential numbers to the groups (after aggregation), with the last date for each account getting a value of 1 (order by min(WhenLastChanged) desc). The outermost select then just chooses that row for each account.

SELECT [Status], MIN([Date])
FROM Table_Name
WHERE [Status] = (SELECT [Status]
FROM Table_Name
WHERE [Date] = (SELECT MAX([Date])
FROM Table_Name)
)
GROUP BY [Status]
Try here Sql Fiddle

Hogan: basically, yes. I just want to know the date/time when the
account was last changed to ACT. The records after the point above
marked THIS are just extra.
Instead of just looking for act we can look for first time status changes and select act (and max) from that.
so... every time a status changes:
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
)
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
now finding the max of the act items.
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
), statuschange as
(
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
)
select max(date)
from satuschange
where status='Act'

Related

Get last and first record using rank()

I need to get first and last record (ordered by Date column) from table for certain SSID. It is not a problem if there is more records with same max or min date. All I need is union all.
I am getting last record having max(date) with:
with c as (
select *, rnk = rank() over (partition by Date order by Date ASC)
from table
where SSID = '00921834800'
)
select top 1 Date, City, Title
from c
order by Date desc
How to I get first record (min(Date)) as well (same thing only with order by Date asc) with single select and without using ranking again?
I'm using MSSQL 2017.

; with c as (
select *,
rnk = rank() over (partition by Date order by Date ASC),
rnk2 = rank() over (partition by Date order by Date desc)
from table
where SSID= '00921834800'
)
select Date,
City,
Title
from c
where rnk = 1 or rnk2 = 1
order by Date desc

I would use the following query:
select * from (select top 1 with ties * from t where ssid = '00921834800' order by date) as a
union all
select * from (select top 1 with ties * from t where ssid = '00921834800' order by date desc) as b

One other solution is :
with
c as
(
select *,
rank() over (partition by Date order by Date ASC) AS RNK,
count() OVER (partition by Date) AS CNT
from table
where SSID= '00921834800')
select Date, City, Title
from c
WHERE RNK = 1
OR CNT = RNK
order by Date desc

Get Earliest Date corresponding to the latest occurrence of a recurring name

I have a table with Name and Date columns. I want to get the earliest date when the current name appeared. For example:
Name
Date
X
30-Jan-2021
X
29-Jan-2021
X
28-Jan-2021
Y
27-Jan-2021
Y
26-Jan-2021
Y
25-Jan-2021
Y
24-Jan-2021
X
23-Jan-2021
X
22-Jan-2021
Now when I try to get the earliest date when current name (X) started to appear, I want 28-Jan, but the sql query would give 22-Jan-2021 because that's when X appeared originally for the first time.
Update: This was the query I was using:
Select min(Date) from myTable where Name='X'
I am using older SQL Server 2008 (in the process of upgrading), so do not have access to LEAD/LAG functions.
The solutions suggested below do work as intended. Thanks.

This is a type of gaps-and-islands problem.
There are many solutions. Here is one that is optimized for your case
Use LEAD/LAG to identify the first row in each grouping
Filter to only those rows
Number them rows and take the first one
WITH StartPoints AS (
SELECT *,
IsStart = CASE WHEN Name <> LEAD(Name, 1, '') OVER (ORDER BY Date DESC) THEN 1 END
FROM YourTable
),
Numbered AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date DESC)
FROM StartPoints
WHERE IsStart = 1 AND Name = 'X'
)
SELECT
Name, Date
FROM Numbered
WHERE rn = 1;
db<>fiddle
For SQL Server 2008 or earlier (which I strongly suggest you upgrade from), you can use a self-join with row-numbering to simulate LEAD/LAG
WITH RowNumbered AS (
SELECT *,
AllRn = ROW_NUMBER() OVER (ORDER BY Date ASC)
FROM YourTable
),
StartPoints AS (
SELECT r1.*,
IsStart = CASE WHEN r1.Name <> ISNULL(r2.Name, '') THEN 1 END
FROM RowNumbered r1
LEFT JOIN RowNumbered r2 ON r2.AllRn = r1.AllRn - 1
),
Numbered AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Date DESC)
FROM StartPoints
WHERE IsStart = 1
)
SELECT
Name, Date
FROM Numbered
WHERE rn = 1;

This is a gaps and island problem. Based on the sample data, this will work:
WITH Groups AS(
SELECT YT.[Name],
YT.[Date],
ROW_NUMBER() OVER (ORDER BY YT.Date DESC) -
ROW_NUMBER() OVER (PARTITION BY YT.[Name] ORDER BY Date DESC) AS Grp
FROM dbo.YourTable YT),
FirstGroup AS(
SELECT TOP (1) WITH TIES
G.[Name],
G.[Date]
FROM Groups G
WHERE [Name] = 'X'
ORDER BY Grp ASC)
SELECT MIN(FG.[Date]) AS Mi
db<>fiddle

If i did understand, you want to know when the X disappeared and reappeared again. in that case you can search for gaps in dates by group.
this and example how to detect that
SELECT name
,DATE
FROM (
SELECT *
,DATEDIFF(day, lead(DATE) OVER (
PARTITION BY name ORDER BY DATE DESC
), DATE) DIF
FROM YourTable
) a
WHERE DIF > 1

Codility SqlEventsDelta (Compute the difference between the latest and the second latest value for each event type)

Recently, I'm practicing code exercises in Codility.
Here you can find the problem, it is in the Exercises 6 - SQL section.
Just start a test to see the problem description! SqlEventsDelta
Problem Define:
I wrote this solution to the SqlEventDelta Question in SQLite. It works fine in local tool But, It was not working in web tool.
Can anyone give any advice on how can I solve this problem?
※ I searched this problem in Stackoverflow and I know a better code then my own way.
But, If possible, I wanna use my own SQLite code logic and function.
WITH cte1 AS
(
SELECT *, CASE WHEN e2.event_type = e2.prev THEN 0
WHEN e2.event_type = e2.next THEN 0
ELSE 1 END AS grp
FROM (SELECT *, LAG(e1.event_type) OVER(ORDER BY (SELECT 1)) AS prev , LEAD(e1.event_type) OVER(ORDER BY (SELECT 1)) AS next FROM events e1) e2
)
,cte2 AS
(
SELECT cte1.event_type, cte1.time, cte1.grp, cte1.value - LAG(cte1.value) OVER(ORDER BY cte1.event_type, cte1.time) AS value
FROM cte1
WHERE cte1.grp = 0
ORDER BY cte1.event_type, cte1.time
)
SELECT c2.event_type, c2.value
FROM cte2 c2
WHERE (c2.event_type, c2.time) IN (
SELECT c2.event_type, MAX(c2.time) AS time
FROM cte2 c2
GROUP BY c2.event_type)
GROUP BY c2.event_type
ORDER BY c2.event_type, c2.time
It ran just fine on my local tool(DB Browser for SQLite Version 3.12.2) without error.
event_type | value
-----------+-----------
2 | -5
3 | 4
Execution finished without errors.
Result: 2 rows returned in 7ms
But, on the web tool(Codility test editor-SQLite Version 3.11.0) can't running and I am getting the following errors.
| Compilation successful.
| Example test: (example test)
| Output (stderr):
| error on query: ...
| ...
| ...,
| details: near "(": syntax error
| RUNTIME ERROR (tested program terminated with exit code 1)
Detected some errors.
SqlEventDelta Question :
Write an SQL query that, for each event_type that has been registered more than once, returns the difference between the latest (i.e. the most recent in terms of time) and the second latest value.
The table should be ordered by event_type (in ascending order).
The names of the columns in the rowset don't matter, but their order does.
Given a table events with the following structure :
create table events (
event_type integer not null,
value integer not null,
time timestamp not null,
unique(event_type, time)
);
For example, given the following data :
event_type | value | time
-----------+------------+--------------------
2 | 5 | 2015-05-09 12:42:00
4 | -42 | 2015-05-09 13:19:57
2 | 2 | 2015-05-09 14:48:30
2 | 7 | 2015-05-09 12:54:39
3 | 16 | 2015-05-09 13:19:57
3 | 20 | 2015-05-09 15:01:09
Given the above data, the output should return the following rowset :
event_type | value
-----------+-----------
2 | -5
3 | 4
Thank you.

I tried to use a somehow naive approach. I'm aware that it is very bad for performance due to many subqueries but the catch here is the "DISTINCT ON" of PostgreSQL, however I got 100% 😃
Hope you like it!
select distinct on (event_type) event_type, result * -1
from (select event_type, value, lead(value) over (order by event_type) - value result
from (select *
from events
where event_type in (select event_type
from events
group by event_type
having count(event_type) >= 2)
order by event_type, time desc) a) b

with data as (SELECT a.event_type, a.value, a.time,
--Produce a virtual table that stores the next and previous values for each event_type.
LEAD(a.value,1) over (PARTITION by a.event_type ORDER by 'event_type', 'time' DESC) as recent_val,
LAG(a.value,1) over (PARTITION by a.event_type ORDER by 'event_type', 'time' DESC) as penult_val
from events a
JOIN (SELECT event_type
from events --Filter the initial dataset for duplicates. Store in correct order
group by event_type HAVING COUNT(*) > 1
ORDER by event_type) b
on a.event_type = b.event_type) --Compare the virtual table to the filtered dataset
SELECT event_type, ("value"-"penult_val") as diff --Perform the desired arithematic
from data
where recent_val is NULL --Filter for the most recent value
Hi team! This one's my answer. It's largely a goopy conglomerate of the answers above, but it reads more simply and it's commented for context. Being a newbie, I hope it helps other newbies.

I do have the same problem when using the sqlite.
Try using below code with PostgreSQL
with data as (select
e.event_type,
e.value,
e.time,
lead(e.value,1) over (PARTITION by e.event_type order by e.event_type,e.time asc) as next_val,
lag (e.value,1) over (PARTITION by e.event_type order by e.event_type,e.time asc) as prev_val
from events e)
select distinct d.event_type, (d.value-d.prev_val) as diff
from
events e,data d
where e.event_type = d.event_type
and d.next_val is null
and e.event_type in ( SELECT event_type
from data
group by
event_type
having count(1) > 1)
order by 1;

Adding another answer involving self joins -
PostgreSQL
-- write your code in PostgreSQL 9.4
WITH TotalRowCount AS (
SELECT
event_type,
COUNT(*) as row_count
FROM events
GROUP BY 1
),
RankedEventType AS (
SELECT
event_type,
value,
ROW_NUMBER() OVER(PARTITION BY event_type ORDER BY time) as row_num
FROM events
)
SELECT
a.event_type,
a.value - b.value as value
FROM RankedEventType a
INNER JOIN TotalRowCount c
ON a.event_type = c.event_type
INNER JOIN RankedEventType b
ON a.event_type = b.event_type
WHERE 1 = 1
AND a.row_num = c.row_count
AND b.row_num = c.row_count - 1
ORDER BY 1

without nested queries, got 100%
with data as (
with count as (select event_type
from events
group by event_type
having count(event_type) >= 2)
select e.event_type , e.value, e.time from events as e inner join count as r on e.event_type=r.event_type order by e.event_type, e.time desc
)
select distinct on (event_type) event_type,
value - (LEAD(value) over (order by event_type)) result from data

Solution with one subquery
WITH diff AS
(SELECT event_type,
value,
LEAD(value) OVER (PARTITION BY event_type
ORDER BY TIME DESC) AS prev
FROM EVENTS
GROUP BY event_type,
value,
time
)
SELECT DISTINCT ON (event_type) event_type,
value - prev
FROM diff
WHERE prev IS NOT NULL;

with deltas as (
select distinct event_type,
first_value(value) over (PARTITION by event_type ORDER by time DESC) -
nth_value(value, 2) over (PARTITION by event_type ORDER by time DESC) as delta
from events
)
select * from deltas where delta is not null order by 1;

--in PostgreSQL 9.4
with ct1 as (SELECT
event_type,
value,
time,
rank() over (partition by event_type order by time desc) as rank
from events),
ct2 as (
select event_type, value, rank, lag (value,1) over (order by event_type) as previous_value
from ct1
order by event_type)
select event_type, previous_value - value from ct2
where rank = 2
order by event_type

My solution:
--Get table with rank 1, 2 group by event_type
with t2 as(
select event_type, value, rank from (
select event_type, value,
rank() over(
partition by event_type
order by time desc) as rank,
count(*) over (partition by event_type) as count
from events) as t
where t.rank <= 2 and t.count > 1
)
--Calculate diff using Lead() and filter out null diff with max
select t3.event_type, max(t3.diff) from (
select event_type,
value - lead(value, 1) over (
partition by event_type
order by rank) as diff
from t2) as t3
group by t3.event_type

Min() and Max() of multiple attributes in a partition window on SQL Server

I have a timetable in SQL Server that has the [SERV_ID] (service-id), [STATION] (station), [ARR] (arrivaltime), [DEP] (departuretime) of a public transport vehicle. Every Service can be present every day [SERV_DAY].
Target is to summarize Serviceday, Service-line, First-station, Last-station, and the corresponding timestamps. --> One row per service per day.
For [SERV_ID] N170 this would be:
SERV_DAY SERV_ID FIRST_STATION MIN_DEP LAST_STATION MAX_ARR
2019-08-14 00:00:00 N170 Downtown 2019-08-14 06:06:00 CentralStation 2019-08-14 07:11:00
I tried to do this by partinioning thru ([SERV_DAY], [SERV_ID]) an then get MAX([ARR]) and MIN([DEP]) for each partition. This works so long, but now I want to get the corresponding Station to each Min and Max.
SELECT
[SERV_DAY],[SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP
FROM #demo
Later I need to add the delay at the last station, which is available in an extended version of the dataset as [ARR_EFFECTIVE] and [DEP_EFFECTIVE]. Hopefully I will be able to do add these attributes as soon as I know how to summarize the daily lines as described above.
This topic is close but I do not get how to adapt the "gap & island problem"
Min() and Max() based on partition in sql server
I have set up a demo dataset in dbfiddle
https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=52e53d43a49ddb8f67454e576bfa7d74
Can anyone help me to finalize the query?

SELECT
[SERV_DAY]
,[SERV_ID],
FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by ARR DESC) Station1
, FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by DEP ASC) Station2
FROM #demo

I think I would use a temp table instead of a CTE if you have a large amount of data, but here is a quick idea on how that should work:
WITH CTE AS
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY ARR ) RN
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY DEP ) RN2
from #demo
)
SELECT t1.[SERV_DAY],t1.[SERV_ID],t1.[STATION] FIRST_STATION, t1.[DEP] MIN_DEP, t2.STATION LAST_STATION
FROM CTE t1
INNER JOIN CTE t2 on t1.SERV_DAY = t2.SERV_DAY and t1.SERV_ID = t2.SERV_ID and t2.RN2 = 1
WHERE t1.RN = 1

You can do that in two steps:
first add a row_number sorted by ARR descending and another row_number sorted by dep. Then you're able to filter on the rows with row_number = 1 in order to select other columns.
Here's an example how to retrieve the station of the max_arr and the min_dep:
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] DESC) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP]) AS RN_DEP,
*
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MAX_ARR_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MIN_DEP_STATION,
*
FROM T

As reply to #casenonsensitive it works using his code and a little modification!
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID], [STATION],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] ) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP] ) AS RN_DEP
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MIN_DEP_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MAX_ARR_STATION, [SERV_DAY], [SERV_ID], MAX_ARR, MIN_DEP from T
group by [SERV_DAY], [SERV_ID], MIN_DEP, MAX_ARR

SQL Server Select most recent record (with a twist)

Suppose I have the following table:
ActionDate ActionType
------------ ------------
2018-08-02 12:59:56.000 Drill
2018-08-02 13:20:45.000 Hammer
2018-08-02 14:36:02.000 Drill
I want to select the most recent ActionType based on the ActionDate. This is not a problem using ROW_NUMBER() OVER syntax and either grabbing the first or last record depending on how I sorted. However consider this table setup:
ActionDate ActionType
------------ ------------
2018-08-02 12:59:56.000 Drill
2018-08-02 13:20:45.000
2018-08-02 14:36:02.000 Drill
In this case, since the only action listed is Drill, I want the oldest occurrence, since the Action didn't actually change. Is there a way to satisfy both requirements at the same time?

You can use TOP 1 WITH TIES with a CASE statement.
select top 1 with ties
*
from YourTable
order by
case
when (select count(distinct ActionType) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end
Or in a subquery if you like that better...
select ActionDate, ActionType
from
(select
*,
RN = case
when (select count(distinct ActionType) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end
from YourTable) x
where RN = 1
This assume the blank is actually a NULL which is ignored in COUNT DISTINCT. If that is a blank space instead of NULL then you need to handle that with an additional CASE or IIF or whatever like this:
select top 1 with ties
*
from YourTable
order by
case
when (select count(distinct case when ActionType = '' then null else ActionType end) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find minimum value in groups of rows - sql

SELECT [Status], MIN([Date]) FROM Table_Name WHERE [Status] = (SELECT [Status] FROM Table_Name WHERE [Date] = (SELECT MAX([Date]) FROM Table_Name) ) GROUP BY [Status] Try here Sql Fiddle

Related

Get last and first record using rank()

Get Earliest Date corresponding to the latest occurrence of a recurring name

Codility SqlEventsDelta (Compute the difference between the latest and the second latest value for each event type)

Min() and Max() of multiple attributes in a partition window on SQL Server

SQL Server Select most recent record (with a twist)

Categories

Resources