SQL Server Select most recent record (with a twist) - sql

Suppose I have the following table:
ActionDate ActionType
------------ ------------
2018-08-02 12:59:56.000 Drill
2018-08-02 13:20:45.000 Hammer
2018-08-02 14:36:02.000 Drill
I want to select the most recent ActionType based on the ActionDate. This is not a problem using ROW_NUMBER() OVER syntax and either grabbing the first or last record depending on how I sorted. However consider this table setup:
ActionDate ActionType
------------ ------------
2018-08-02 12:59:56.000 Drill
2018-08-02 13:20:45.000
2018-08-02 14:36:02.000 Drill
In this case, since the only action listed is Drill, I want the oldest occurrence, since the Action didn't actually change. Is there a way to satisfy both requirements at the same time?

You can use TOP 1 WITH TIES with a CASE statement.
select top 1 with ties
*
from YourTable
order by
case
when (select count(distinct ActionType) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end
Or in a subquery if you like that better...
select ActionDate, ActionType
from
(select
*,
RN = case
when (select count(distinct ActionType) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end
from YourTable) x
where RN = 1
This assume the blank is actually a NULL which is ignored in COUNT DISTINCT. If that is a blank space instead of NULL then you need to handle that with an additional CASE or IIF or whatever like this:
select top 1 with ties
*
from YourTable
order by
case
when (select count(distinct case when ActionType = '' then null else ActionType end) from #table) = 1
then row_number() over (order by ActionDate asc)
else row_number() over (order by ActionDate desc)
end

Related

How to filter to get one unique record using SQL

I have a table similar to this. If there is a confirmed record, I want to select the oldest record and if not, select the most recent one. In this case, I would want the 4_A record.
ID
Record
Type
Date
1_A
1
auto
4/7/2021
2_A
1
confirmed
4/1/2021
3_A
1
suggested
4/5/2021
4_A
1
confirmed
4/2/2021
5_A
1
suggested
4/5/2021
I've been able to use the a window function and QUALIFY to filter the most recent one but not sure how to include the TYPE field into the mix.
SELECT * from TABLE WHERE QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD ORDER BY RECORD,DATE DESC) = 1 ;
Let me assume that you mean the oldest confirmed date if there is a confrimed:
SELECT *
FROM TABLE
WHERE QUALIFY ROW_NUMBER() OVER (PARTITION BY RECORD
ORDER BY (CASE WHEN Type = 'Confirmed' THEN 1 ELSE 2 END),
(CASE WHEN Type = 'Confirmed' THEN DATE END) ASC,
DATE ASC
) = 1;
If you really mean the oldest date if there is a confirmed, then:
SELECT *
FROM TABLE
QUALIFY (CASE WHEN COUNT_IF( Type = 'Confirmed') OVER (PARTITION BY RECORD)
THEN ROW_NUMBER() OVER (PARTITION BY RECORD ORDER BY DATE)
THEN ROW_NUMBER() OVER (PARTITION BY RECORD ORDER BY DATE DESC)
END) = 1;

Min() and Max() of multiple attributes in a partition window on SQL Server

I have a timetable in SQL Server that has the [SERV_ID] (service-id), [STATION] (station), [ARR] (arrivaltime), [DEP] (departuretime) of a public transport vehicle. Every Service can be present every day [SERV_DAY].
Target is to summarize Serviceday, Service-line, First-station, Last-station, and the corresponding timestamps. --> One row per service per day.
For [SERV_ID] N170 this would be:
SERV_DAY SERV_ID FIRST_STATION MIN_DEP LAST_STATION MAX_ARR
2019-08-14 00:00:00 N170 Downtown 2019-08-14 06:06:00 CentralStation 2019-08-14 07:11:00
I tried to do this by partinioning thru ([SERV_DAY], [SERV_ID]) an then get MAX([ARR]) and MIN([DEP]) for each partition. This works so long, but now I want to get the corresponding Station to each Min and Max.
SELECT
[SERV_DAY],[SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP
FROM #demo
Later I need to add the delay at the last station, which is available in an extended version of the dataset as [ARR_EFFECTIVE] and [DEP_EFFECTIVE]. Hopefully I will be able to do add these attributes as soon as I know how to summarize the daily lines as described above.
This topic is close but I do not get how to adapt the "gap & island problem"
Min() and Max() based on partition in sql server
I have set up a demo dataset in dbfiddle
https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=52e53d43a49ddb8f67454e576bfa7d74
Can anyone help me to finalize the query?
SELECT
[SERV_DAY]
,[SERV_ID],
FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by ARR DESC) Station1
, FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by DEP ASC) Station2
FROM #demo
I think I would use a temp table instead of a CTE if you have a large amount of data, but here is a quick idea on how that should work:
WITH CTE AS
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY ARR ) RN
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY DEP ) RN2
from #demo
)
SELECT t1.[SERV_DAY],t1.[SERV_ID],t1.[STATION] FIRST_STATION, t1.[DEP] MIN_DEP, t2.STATION LAST_STATION
FROM CTE t1
INNER JOIN CTE t2 on t1.SERV_DAY = t2.SERV_DAY and t1.SERV_ID = t2.SERV_ID and t2.RN2 = 1
WHERE t1.RN = 1
You can do that in two steps:
first add a row_number sorted by ARR descending and another row_number sorted by dep. Then you're able to filter on the rows with row_number = 1 in order to select other columns.
Here's an example how to retrieve the station of the max_arr and the min_dep:
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] DESC) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP]) AS RN_DEP,
*
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MAX_ARR_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MIN_DEP_STATION,
*
FROM T
As reply to #casenonsensitive it works using his code and a little modification!
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID], [STATION],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] ) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP] ) AS RN_DEP
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MIN_DEP_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MAX_ARR_STATION, [SERV_DAY], [SERV_ID], MAX_ARR, MIN_DEP from T
group by [SERV_DAY], [SERV_ID], MIN_DEP, MAX_ARR

combine row groups within time range

I am facing a problem in a query and I am stuck for quite some time now. Here the situation: I have a table with certain records in it, which are terminated via ValidFrom and ValidTo. This table tracks data changes of another table - everytime the underlying data changes, the last valid record is terminated and an insert is performed. Following a SQL fiddle example:
http://sqlfiddle.com/#!18/15c7f/4/0
What I try to achieve is to group records with identical Flags within a timespan into one record. In my fiddle ecample: I would expect the first two records being combined to one record with ValidFrom 2017-01-01 and ValidTo 2017-01-10.
Anyways, I am severly stuck and I tried numerous approaches I found here and in another forum - but without success. One approach is included in the fiddle: evaluate the rownumber ordered by date and substract the rownumber partitioned by flag columns etc. ... but nothing works out.
Any help would be highly appreciated.
Try this query:
select keycol, min(validfrom), max(validto), flag1, flag2 from
(
select *,
sum(iscontinuation) over (partition by keycol order by validfrom rows between UNBOUNDED PRECEDING AND CURRENT ROW) [GroupingCol]
from (
select *,
case when
lag(validto) over (partition by keycol order by validfrom) = dateadd(day, -1, validfrom) and
lag(flag1) over (partition by keycol order by validfrom) = flag1 and
lag(flag2) over (partition by keycol order by validfrom) = flag2 then 0
else 1 end [IsContinuation]
from t
) a
) b group by keycol, flag1, flag2, groupingcol
You need to use min max with partition by your group.Got some help from Michal' Query . This can handle multiple flag data as well
With CTE2 as(
select *,
(case when flag1 = lag(flag1) over (order by (select null)) and
flag2 = lag(flag2) over (order by (select null)) then -1 else 0 end) +
row_number() over (order by (select null)) [GroupingCol]
from t)
Select KeyCol,Flag1,Flag2,ValidFrom,ValidTo From (
Select KeyCol,Flag1,Flag2,
min(ValidFrom) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidFrom ,
Max(ValidTo) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidTo,
Row_number() over (partition by KeyCol,Flag1,Flag2,GroupingCol order by keycol,ValidFrom) RN
From CTE2) A where RN=1

Eliminating duplicate records in SQL

I have a table called attribute_value with the following columns
attribute_id | start_date | value | latest_ind | mod_dtime
The latest_ind column can have a value of either 1 or 0.
I basically want to run an update script on this table which finds all the attributes that have a common start date and a latest_ind equal to one and set the latest ind to zero EXCEPT in the case where the record is the latest one.
I've managed to put together the following SELECT query but I have no idea how I would go about converting it into an update. Any pointers would be appreciated
SELECT av.attribute_id, av.start_date, count(latest_ind), max(mod_dtime)
FROM t_attribute_value av
where latest_ind = 1
group by attribute_id, start_date
having count(latest_ind) > 1
This is a case where an UPDATE using a CTE comes in handy:
;WITH ToUpdate AS (
SELECT latest_ind,
ROW_NUMBER() OVER (PARTITION BY attribute_id, start_date
ORDER BY mod_dtime DESC) AS rn
FROM attribute_value
WHERE latest_ind = 1
)
UPDATE ToUpdate
SET latest_ind = 0
WHERE rn > 1
The update operation is propagated to the real table. Hence, in case of a attribute_id, start_date partition with a population greater than one, all records but the lastest are updated.
May be something like this
Method 1 : With CTE
;WITH T AS
( SELECT attribute_id, start_date, latest_ind,
ROW_NUMBER() OVER (PARTITION BY av.attribute_id, av.start_date ORDER BY mod_dtime DESC) RN
FROM t_attribute_value
where latest_ind = 1
)
UPDATE T
SET latest_ind = 0
WHERE RN > 1
Method 2: You don't need a CTE for this
UPDATE T
SET T.latest_ind = 0
FROM t_attribute_value T
INNER JOIN
(
SELECT attribute_id, start_date, latest_ind,
ROW_NUMBER() OVER (PARTITION BY av.attribute_id, av.start_date ORDER BY mod_dtime DESC) RN
FROM t_attribute_value
where latest_ind = 1
) V
ON T.attribute_id= V.attribute_id AND V.RN > 1

Find minimum value in groups of rows

In the SQL space (specifically T-SQL, SQL Server 2008), given this list of values:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060
ACT 2012-01-07 11:51:07.920
ACT 2012-01-08 04:13:29.140
NOS 2012-01-09 04:29:16.873
ACT 2012-01-21 12:39:37.607 <-- THIS
ACT 2012-01-21 12:40:03.840
ACT 2012-05-02 16:27:17.370
GRAD 2012-05-19 13:30:02.503
GRAD 2013-09-03 22:58:48.750
Generated from this query:
SELECT Status, Date
FROM Account_History
WHERE AccountNumber = '1234'
ORDER BY Date
The status for this particular object started at ACT, then changed to NOS, then back to ACT, then to GRAD.
What is the best way to get the minimum date from the latest "group" of records where Status = 'ACT'?
Here is a query that does this, by identifying the groups where the student statuses are the same and then using simple aggregation:
select top 1 StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged,
(row_number() over (order by "date") -
row_number() over (partition by studentstatus order by "date)
) as grp
FROM Account_History
WHERE AccountNumber = '1234'
) t
where StudentStatus = 'ACT'
group by StudentStatus, grp
order by WhenLastChanged desc;
The row_number() function assigns sequential numbers within groups of rows based on the date. For your data, the two row_numbers() and their difference is:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060 1 1 0
ACT 2012-01-07 11:51:07.920 2 2 0
ACT 2012-01-08 04:13:29.140 3 3 0
NOS 2012-01-09 04:29:16.873 4 1 3
ACT 2012-01-21 12:39:37.607 5 4 1
ACT 2012-01-21 12:40:03.840 6 5 1
ACT 2012-05-02 16:27:17.370 7 6 1
GRAD 2012-05-19 13:30:02.503 8 1 7
GRAD 2013-09-03 22:58:48.750 9 2 7
Notice the last row is constant for rows that have the same status.
The aggregation brings these together and chooses the latest (top 1 . . . order by date desc) of the first dates (min(date)).
EDIT:
The query is easy to tweak for multiple account numbers. I probably should have written that way to begin with, except the final selection is trickier. The results from this has the date for each status and account:
select StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged, AccountNumber
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
order by WhenLastChanged desc;
But you can't get the last one per account quite so easily. Another level of subqueries:
select AccountNumber, StudentStatus, WhenLastChanged
from (select AccountNumber, StudentStatus, min(WhenLastChanged) as WhenLastChanged,
row_number() over (partition by AccountNumber, StudentStatus order by min(WhenLastChanged) desc
) as seqnum
from (SELECT AccountNumber, StudentStatus, WhenLastChanged,
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
) t
where seqnum = 1;
This uses aggregation along with the window function row_number(). This is assigning sequential numbers to the groups (after aggregation), with the last date for each account getting a value of 1 (order by min(WhenLastChanged) desc). The outermost select then just chooses that row for each account.
SELECT [Status], MIN([Date])
FROM Table_Name
WHERE [Status] = (SELECT [Status]
FROM Table_Name
WHERE [Date] = (SELECT MAX([Date])
FROM Table_Name)
)
GROUP BY [Status]
Try here Sql Fiddle
Hogan: basically, yes. I just want to know the date/time when the
account was last changed to ACT. The records after the point above
marked THIS are just extra.
Instead of just looking for act we can look for first time status changes and select act (and max) from that.
so... every time a status changes:
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
)
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
now finding the max of the act items.
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
), statuschange as
(
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
)
select max(date)
from satuschange
where status='Act'