combine row groups within time range - sql

I am facing a problem in a query and I am stuck for quite some time now. Here the situation: I have a table with certain records in it, which are terminated via ValidFrom and ValidTo. This table tracks data changes of another table - everytime the underlying data changes, the last valid record is terminated and an insert is performed. Following a SQL fiddle example:
http://sqlfiddle.com/#!18/15c7f/4/0
What I try to achieve is to group records with identical Flags within a timespan into one record. In my fiddle ecample: I would expect the first two records being combined to one record with ValidFrom 2017-01-01 and ValidTo 2017-01-10.
Anyways, I am severly stuck and I tried numerous approaches I found here and in another forum - but without success. One approach is included in the fiddle: evaluate the rownumber ordered by date and substract the rownumber partitioned by flag columns etc. ... but nothing works out.
Any help would be highly appreciated.

Try this query:
select keycol, min(validfrom), max(validto), flag1, flag2 from
(
select *,
sum(iscontinuation) over (partition by keycol order by validfrom rows between UNBOUNDED PRECEDING AND CURRENT ROW) [GroupingCol]
from (
select *,
case when
lag(validto) over (partition by keycol order by validfrom) = dateadd(day, -1, validfrom) and
lag(flag1) over (partition by keycol order by validfrom) = flag1 and
lag(flag2) over (partition by keycol order by validfrom) = flag2 then 0
else 1 end [IsContinuation]
from t
) a
) b group by keycol, flag1, flag2, groupingcol

You need to use min max with partition by your group.Got some help from Michal' Query . This can handle multiple flag data as well
With CTE2 as(
select *,
(case when flag1 = lag(flag1) over (order by (select null)) and
flag2 = lag(flag2) over (order by (select null)) then -1 else 0 end) +
row_number() over (order by (select null)) [GroupingCol]
from t)
Select KeyCol,Flag1,Flag2,ValidFrom,ValidTo From (
Select KeyCol,Flag1,Flag2,
min(ValidFrom) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidFrom ,
Max(ValidTo) over (partition by KeyCol,Flag1,Flag2,GroupingCol) ValidTo,
Row_number() over (partition by KeyCol,Flag1,Flag2,GroupingCol order by keycol,ValidFrom) RN
From CTE2) A where RN=1

Related

Complex Ranking in SQL (Teradata)

I have a peculiar problem at hand. I need to rank in the following manner:
Each ID gets a new rank.
rank #1 is assigned to the ID with the lowest date. However, the subsequent dates for that particular ID can be higher but they will get the incremental rank w.r.t other IDs.
(E.g. ADF32 series will be considered to be ranked first as it had the lowest date, although it ends with dates 09-Nov, and RT659 starts with 13-Aug it will be ranked subsequently)
For a particular ID, if the days are consecutive then ranks are same, else they add by 1.
For a particular ID, ranks are given in date ASC.
How to formulate a query?
You need two steps:
select
id_col
,dt_col
,dense_rank()
over (order by min_dt, id_col, dt_col - rnk) as part_col
from
(
select
id_col
,dt_col
,min(dt_col)
over (partition by id_col) as min_dt
,rank()
over (partition by id_col
order by dt_col) as rnk
from tab
) as dt
dt_col - rnk caluclates the same result for consecutives dates -> same rank
Try datediff on lead/lag and then perform partitioned ranking
select t.ID_COL,t.dt_col,
rank() over(partition by t.ID_COL, t.date_diff order by t.dt_col desc) as rankk
from ( SELECT ID_COL,dt_col,
DATEDIFF(day, Lag(dt_col, 1) OVER(ORDER BY dt_col),dt_col) as date_diff FROM table1 ) t
One way to think about this problem is "when to add 1 to the rank". Well, that occurs when the previous value on a row with the same id_col differs by more than one day. Or when the row is the earliest day for an id.
This turns the problem into a cumulative sum:
select t.*,
sum(case when prev_dt_col = dt_col - 1 then 0 else 1
end) over
(order by min_dt_col, id_col, dt_col) as ranking
from (select t.*,
lag(dt_col) over (partition by id_col order by dt_col) as prev_dt_col,
min(dt_col) over (partition by id_col) as min_dt_col
from t
) t;

Min() and Max() of multiple attributes in a partition window on SQL Server

I have a timetable in SQL Server that has the [SERV_ID] (service-id), [STATION] (station), [ARR] (arrivaltime), [DEP] (departuretime) of a public transport vehicle. Every Service can be present every day [SERV_DAY].
Target is to summarize Serviceday, Service-line, First-station, Last-station, and the corresponding timestamps. --> One row per service per day.
For [SERV_ID] N170 this would be:
SERV_DAY SERV_ID FIRST_STATION MIN_DEP LAST_STATION MAX_ARR
2019-08-14 00:00:00 N170 Downtown 2019-08-14 06:06:00 CentralStation 2019-08-14 07:11:00
I tried to do this by partinioning thru ([SERV_DAY], [SERV_ID]) an then get MAX([ARR]) and MIN([DEP]) for each partition. This works so long, but now I want to get the corresponding Station to each Min and Max.
SELECT
[SERV_DAY],[SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP
FROM #demo
Later I need to add the delay at the last station, which is available in an extended version of the dataset as [ARR_EFFECTIVE] and [DEP_EFFECTIVE]. Hopefully I will be able to do add these attributes as soon as I know how to summarize the daily lines as described above.
This topic is close but I do not get how to adapt the "gap & island problem"
Min() and Max() based on partition in sql server
I have set up a demo dataset in dbfiddle
https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=52e53d43a49ddb8f67454e576bfa7d74
Can anyone help me to finalize the query?
SELECT
[SERV_DAY]
,[SERV_ID],
FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by ARR DESC) Station1
, FIRST_VALUE(STATION) over (Partition by [SERV_DAY],[SERV_ID] Order by DEP ASC) Station2
FROM #demo
I think I would use a temp table instead of a CTE if you have a large amount of data, but here is a quick idea on how that should work:
WITH CTE AS
(
SELECT *
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY ARR ) RN
, ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY DEP ) RN2
from #demo
)
SELECT t1.[SERV_DAY],t1.[SERV_ID],t1.[STATION] FIRST_STATION, t1.[DEP] MIN_DEP, t2.STATION LAST_STATION
FROM CTE t1
INNER JOIN CTE t2 on t1.SERV_DAY = t2.SERV_DAY and t1.SERV_ID = t2.SERV_ID and t2.RN2 = 1
WHERE t1.RN = 1
You can do that in two steps:
first add a row_number sorted by ARR descending and another row_number sorted by dep. Then you're able to filter on the rows with row_number = 1 in order to select other columns.
Here's an example how to retrieve the station of the max_arr and the min_dep:
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] DESC) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP]) AS RN_DEP,
*
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MAX_ARR_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MIN_DEP_STATION,
*
FROM T
As reply to #casenonsensitive it works using his code and a little modification!
WITH T AS (
SELECT
[SERV_DAY], [SERV_ID], [STATION],
MAX([ARR]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MAX_ARR,
MIN([DEP]) OVER(PARTITION BY [SERV_DAY],[SERV_ID]) AS MIN_DEP,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [ARR] ) AS RN_ARR,
ROW_NUMBER() OVER(PARTITION BY [SERV_DAY],[SERV_ID] ORDER BY [DEP] ) AS RN_DEP
FROM #demo
)
SELECT MAX(CASE WHEN RN_ARR = 1 THEN [STATION] END) MIN_DEP_STATION,
MAX(CASE WHEN RN_DEP = 1 THEN [STATION] END) MAX_ARR_STATION, [SERV_DAY], [SERV_ID], MAX_ARR, MIN_DEP from T
group by [SERV_DAY], [SERV_ID], MIN_DEP, MAX_ARR

Oracle LEAD - return next matching column value

I having below data in one table.
And I want to get NEXT out data from OUT column. So used LEAD function in below query.
SELECT ROW_NUMBER,TIMESTAMP,IN,OUT,LEAD(OUT) OVER (PARTITION BY NULL ORDER BY TIMESTAMP) AS NEXT_OUT
FROM MYTABLE;
It gives data as below NEXT_OUT column.
But I need to know the matching next column value in sequential way like DESIRED columns. Please let me know how can i achieve this in Oracle LEAD FUNCTION
THANKS
Assign row number to all INs and OUTs separately, sort the results by placing them in a single column and calculate LEADs:
WITH cte AS (
SELECT t.*
, CASE WHEN "IN" IS NOT NULL THEN COUNT("IN") OVER (ORDER BY "TIMESTAMP") END AS rn1
, CASE WHEN "OUT" IS NOT NULL THEN COUNT("OUT") OVER (ORDER BY "TIMESTAMP") END AS rn2
FROM t
)
SELECT cte.*
, LEAD("OUT") OVER (ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST) AS NEXT_OUT
FROM cte
ORDER BY COALESCE(rn1, rn2), rn1 NULLS LAST
Demo on db<>fiddle
Enumerate in the "in"s and the "out"s and use that information for matching.
select tin.*, tout.out as next_out
from (select t.*,
count(in) over (order by timestamp) as seqnum_in
from t
) tin left join
(select t.*,
count(out) over (order by timestamp) as seqnum_out
from t
) tout
on tin.in is not null and
tout.out is not null and
tin.seqnum_in = tout.seqnum_out;

Vertica/SQL: Getting rows immediately proceeding events

Consider a simple query
select from tbl where status=MELTDOWN
I would like to now create a table that in addition to including these rows, also includes the previous p rows and the subsequent n rows, so that I can get a sense as to what happens in the surrounding time of these MELTDOWNs. Any hints?
You can do this with window functions by getting the seqnum of the meltdown rows. I prefer to do this with lag()/lead() ignore nulls, but Vertical doesn't support that. I think this is the equivalent with first_value()/last_value():
with t as (
select t.*, row_number() over (order by id) as seqnum
from tbl
),
tt as (
select t.*,
last_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between unbounded preceding and current row) as prev_meltdown_seqnum,
first_value(case when status = 'meltdown' then seqnum end ignore nulls) over (order by seqnum rows between current row and unbounded following) as prev_meltdown_seqnum,
from t
)
select tt.*
from tt
where seqnum between prev_melt_seqnum and prev_melt_seqnum + 7 or
seqnum between next_melt_seqnum -5 and next_melt_seqnum;
WITH
grouped AS
(
SELECT
SUM(
CASE WHEN status = 'Meltdown' THEN 1 ELSE 0 END
)
OVER (
ORDER BY timeStamp
)
AS GroupID,
tbl.*
FROM
tbl
),
sorted AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp ASC ) AS incPos,
ROW_NUMBER() OVER (PARTITION BY GroupID ORDER BY timeStamp DESC) AS decPos,
MAX(GroupID) OVER () AS LastGroup
grouped.*
FROM
grouped
)
SELECT
sorted.*
FROM
sorted
WHERE
(incPos <= 8 AND GroupID > 0 ) -- Meltdown and the 7 events following it
OR (decPos <= 6 AND GroupID <> LastGroup) -- and the 6 events preceding a Meltdown
ORDER BY
timeStamp

SQL Find the minimum date based on consecutive values

I'm having trouble constructing a query that can find consecutive values meeting a condition. Example data below, note that Date is sorted DESC and is grouped by ID.
To be selected, for each ID, the most recent RESULT must be 'Fail', and what I need back is the earliest date in that run of 'Fails'. For ID==1, only the 1st two values are of interest (the last doesn't count due to prior 'Complete'. ID==2 doesn't count at all, failing the first condition, and for ID==3, only the first value matters.
A result table might be:
The trick seems to be doing some type of run-length encoding, but even with several attempts manipulating ROW_NUM and an attempt at the tabibitosan method for grouping consecutive values, I've been unable to gain traction.
Any help would be appreciated.
If your database supports window functions, you can do
select id, case when result='Fail' then earliest_fail_date end earliest_fail_date
from (
select t.*
,row_number() over(partition by id order by dt desc) rn
,min(case when result = 'Fail' then dt end) over(partition by id) earliest_fail_date
from tablename t
) x
where rn=1
Use row_number to get the latest row in the table. min() over() to get the earliest fail date for each id. If the first row has status Fail, you select the earliest_fail_date or else it would be null.
It should be noted that the expected result for id=1 is wrong. It should be 2016-09-20 as it is the earliest fail date.
Edit: Having re-read the question, i think this is what you might be looking for. Getting the minimum Fail date from the latest consecutive groups of Fail rows.
with grps as (
select t.*,row_number() over(partition by id order by dt desc) rn
,row_number() over(partition by id order by dt)-row_number() over(partition by id,result order by dt) grp
from tablename t
)
,maxfailgrp as (
select g.*,
max(case when result = 'Fail' then grp end) over(partition by id) maxgrp
from grps g
)
select id,
case when result = 'Fail' then (select min(dt) from maxfailgrp where id = m.id and grp=m.maxgrp) end earliest_fail_date
from maxfailgrp m
where rn=1
Sample Demo